購物籃分析又稱關聯分析,從大量的交易資料中探勘資料間具有相關性的隱藏商業規則,其中最經典的就是啤酒與尿布的例子。 以下透過 R 針對台南牛肉湯之店面銷售資料進行簡單範例的操作與說明(資料更新至2024/04)。

首先安裝與載入 arules 套件

# install.packages("arules", repos="http://cran.us.r-project.org")
library("arules")  
## 載入需要的套件:Matrix
## 
## 載入套件:'arules'
## 下列物件被遮斷自 'package:base':
## 
##     abbreviate, write

接下來讀取店面銷售資料,使用 summary 指令,獲取其摘要訊息。

cow <- read.transactions("cow.csv")                                
summary(cow)     
## transactions as itemMatrix in sparse format with
##  1143 rows (elements/itemsets/transactions) and
##  50 columns (items) and a density of 0.05989501 
## 
## most frequent items:
##    溫體牛肉湯      溫體牛肉    芥蘭炒牛肉 牛肉貢丸(7粒)    滑蛋炒牛肉 
##           365           283           227           221           200 
##       (Other) 
##          2127 
## 
## element (itemset/transaction) length distribution:
## sizes
##   0   1   2   3   4   5   6   7   8   9  11 
##  30 189 311 233 172 106  55  29   8   6   4 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.000   3.000   2.995   4.000  11.000 
## 
## includes extended item information - examples:
##       labels
## 1   三杯牛尾
## 2 三菇炒牛肉
## 3   牛三層肉

展示訂單資料的前八條交易資料:

inspect(cow[1:8])     
##     items                                                               
## [1] {青椒炒牛肉, 綜合湯}                                                
## [2] {牛肉貢丸(7粒), 加肉, 玉米, 娃娃菜, 高麗菜, 溫體牛肉3+1, 滑蛋炒牛肉}
## [3] {芥蘭炒牛肉, 溫體牛肉湯, 滑蛋炒牛肉}                                
## [4] {溫體牛肉湯}                                                        
## [5] {溫體牛肉湯}                                                        
## [6] {三菇炒牛肉, 溫體牛肉湯}                                            
## [7] {溫體牛肉湯, 滑蛋炒牛肉}                                            
## [8] {溫體牛肉湯, 滑蛋炒牛肉, 綜合湯}

除了看商品品項之外,我們也可以使用 size 函數,來看單次購買的品項有幾種商品。

size(cow[1:8])   
## [1] 2 7 3 1 1 2 2 3

然後我們再利用 itemFrequency 函數計算以列出每一項品項佔的比例,藉此也可以找出出現頻率比較高的商品。

itemFrequency(cow)
##       三杯牛尾     三菇炒牛肉       牛三層肉     牛三層肉湯           牛心 
##   0.0017497813   0.0218722660   0.0262467192   0.0489938758   0.0061242345 
##         牛心湯  牛肉貢丸(7粒)         牛骨隨           牛筋         牛筋湯 
##   0.0026246719   0.1933508311   0.0131233596   0.0454943132   0.0061242345 
##           牛腱           牛雜         牛雜湯           牛鞭           冬粉 
##   0.0096237970   0.0323709536   0.0306211724   0.0008748906   0.0192475941 
##           加肉           玉米         玉米筍         白蘿蔔     油菜炒牛肉 
##   0.1146106737   0.0577427822   0.0113735783   0.0113735783   0.0139982502 
##       炒牛肉麵         炒牛雜       炒牛雜麵     芥蘭炒牛肉   金沙苦瓜牛肉 
##   0.0892388451   0.0551181102   0.0183727034   0.1986001750   0.0244969379 
##         青江菜     青椒炒牛肉     青蔥炒牛肉         娃娃菜     洋蔥炒牛肉 
##   0.0358705162   0.0454943132   0.0411198600   0.1041119860   0.0192475941 
##         凍豆腐           茼蒿         高麗菜   高麗菜炒牛肉       麻油牛心 
##   0.0358705162   0.0621172353   0.1461067367   0.0971128609   0.0096237970 
##       麻油牛腦 溫體牛三層肉鍋       溫體牛肉    溫體牛肉3+1     溫體牛肉湯 
##   0.0008748906   0.0096237970   0.2475940507   0.1688538933   0.3193350831 
##     溫體牛肉鍋  溫體牛肉鍋3+1     滑蛋炒牛肉         綜合湯         綜合菇 
##   0.1067366579   0.0586176728   0.1749781277   0.0533683290   0.0883639545 
##     蒜頭炒牛肉       蔥炒牛肉     薑絲炒牛肉     薑絲炒牛肚       爆炒牛筋 
##   0.0472440945   0.0008748906   0.0174978128   0.0402449694   0.0104986877

用 itemFrequencyPlot 指令繪出 Top10 產品。

itemFrequencyPlot(cow,topN = 10)

itemFrequencyPlot(cow,topN = 10,type = "absolute")

itemFrequencyPlot(cow,topN = 10,horiz = T,
 main = "Item Frequency",xlab = "Relative Frequency")

顯示比例至少高於 0.1 產品所佔的比例圖, support 參數是支持度的意思 (通常會默認是 0.1,如果不使用的話,將列出所有產品品項,畫面會很亂。)

itemFrequencyPlot(cow,support = 0.1,
 main = "Item Frequency with S = 0.1",ylab = "Relative Frequency")

接下來,我們嘗試運用 apriori (先驗演算法) 及 eclat (等價類變換演算法,Equivalence CLAss Transformation, Eclat) 函數 ,看看是否可以從資料中發掘一些有趣的結論:

apriori 演算法大致的運作方式,是首先透過設定 support 以及 confidence 兩個參數,再進一步觀察第三個參數 lift 與第四個參數 coverage:

  1. 支持度 (support):「規則」在資料內具有的普遍性,也就是這些 A 跟 B 同時出現的機率多少。 \(Support(A \longrightarrow B) = P(A,B)\)

  2. 信賴度 (confidence):「規則」要有一定的信心水準,也就是當購買 A 狀態下,也會購買 B 的條件機率。\(confidence(A \longrightarrow B) = P(B|A)=P(A,B)/P(A)\)

  3. 提升度 (lift):「規則」對於特定商品的存在要有一定的提升效果,也就是在產品 B 出現的可能基礎 \(P(B)\) 上,購買 A 狀態下也會購買 B 的條件機率 \(P(B|A)\) ( A 出現的前提下 B 的出現率) 的提升程度。\(lift(A \longrightarrow B) = P(B|A)/P(B) = confidence(A \longrightarrow B) / P(B)\)

  4. 覆蓋度 (coverage):「規則」對於特定商品的存在機率是否達到一定的標準,影響了關聯規則的適用性,又稱 LHS-support。也就是 A 產品的購買比例。 \(coverage(A) = P(A)\)

首先,嘗試對 apriori 函數做最少的限制,再依結果來決定該如何調整。支持度的最小閾值暫設定為 0.001 ,信賴度的最小閾值暫設定為 0.5 ,其他參數暫時不設定採預設值,並將所得之關聯規則命名為 rules0:

rules0=apriori(cow,parameter=list(support=0.001,confidence=0.5))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 1 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [47 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10
## Warning in apriori(cow, parameter = list(support = 0.001, confidence = 0.5)):
## Mining stopped (maxlen reached). Only patterns up to a length of 10 returned!
##  done [0.00s].
## writing ... [15503 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

結果顯示 支持度、信賴度 最小值的參數詳解 (parameter specification) 以及計錄演算法執行過程中相關參數的演算法控制 (algorithmic control) 以及若干執行細節。

rules0      # 顯示 rules0 產生之關聯規則數目      
## set of 15503 rules

可看出 rules0 中共包含 15503 條關聯規則,完整顯示這 15503 條關聯規則並沒有太大的意義。而且,透過觀察前 10 條規則,我們發現關聯規則的先後順序與其關聯強度的四個參數值 (support、confidence、lift、coverage) 的取值大小沒有明顯的關聯。

inspect(rules0[1:10])   # 顯示rules0其中前 10 條規則   
##      lhs                 rhs            support     confidence coverage   
## [1]  {三杯牛尾}       => {高麗菜炒牛肉} 0.001749781 1.0000000  0.001749781
## [2]  {牛心湯}         => {溫體牛肉湯}   0.001749781 0.6666667  0.002624672
## [3]  {牛筋湯}         => {芥蘭炒牛肉}   0.004374453 0.7142857  0.006124234
## [4]  {牛筋湯}         => {溫體牛肉湯}   0.003499563 0.5714286  0.006124234
## [5]  {牛心}           => {青江菜}       0.004374453 0.7142857  0.006124234
## [6]  {牛心}           => {娃娃菜}       0.003499563 0.5714286  0.006124234
## [7]  {牛心}           => {高麗菜}       0.004374453 0.7142857  0.006124234
## [8]  {牛心}           => {溫體牛肉}     0.004374453 0.7142857  0.006124234
## [9]  {溫體牛三層肉鍋} => {茼蒿}         0.006124234 0.6363636  0.009623797
## [10] {溫體牛三層肉鍋} => {溫體牛肉}     0.005249344 0.5454545  0.009623797
##      lift      count
## [1]  10.297297 2    
## [2]   2.087671 2    
## [3]   3.596602 5    
## [4]   1.789432 4    
## [5]  19.912892 5    
## [6]   5.488595 4    
## [7]   4.888794 5    
## [8]   2.884907 5    
## [9]  10.244558 7    
## [10]  2.203020 6

由於訂單資料尚在累積中,交易資料有限,因此可看出前 10 條規則有超過一半都不高於五筆交易,顯然不是理想的關聯規則,coverage偏低。

面對複雜混亂的大量訊息,較好的方法是針對生成規則進行強度控制,以生成關聯性較強的若干重要規則,以下透過幾種嘗試進行參數調整:

  1. 將支持度調整為 0.005,信賴度維持在 0.5,並將所得之關聯規則命名為 rules01,得 314 條關聯規則。
rules01=apriori(cow,parameter=list(support=0.005,confidence=0.5))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 5 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [45 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [314 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules01         # 顯示 rules01 產生之關聯規則數目      
## set of 314 rules
inspect(rules01[1:10])       # 顯示其中前 10 條規則   
##      lhs                 rhs             support     confidence coverage   
## [1]  {溫體牛三層肉鍋} => {茼蒿}          0.006124234 0.6363636  0.009623797
## [2]  {溫體牛三層肉鍋} => {溫體牛肉}      0.005249344 0.5454545  0.009623797
## [3]  {爆炒牛筋}       => {溫體牛肉鍋3+1} 0.005249344 0.5000000  0.010498688
## [4]  {麻油牛心}       => {溫體牛肉3+1}   0.008748906 0.9090909  0.009623797
## [5]  {牛腱}           => {溫體牛肉鍋3+1} 0.005249344 0.5454545  0.009623797
## [6]  {牛腱}           => {茼蒿}          0.008748906 0.9090909  0.009623797
## [7]  {玉米筍}         => {溫體牛肉鍋}    0.006999125 0.6153846  0.011373578
## [8]  {玉米筍}         => {高麗菜}        0.007874016 0.6923077  0.011373578
## [9]  {洋蔥炒牛肉}     => {溫體牛肉湯}    0.011373578 0.5909091  0.019247594
## [10] {油菜炒牛肉}     => {高麗菜}        0.006999125 0.5000000  0.013998250
##      lift      count
## [1]  10.244558  7   
## [2]   2.203020  6   
## [3]   8.529851  6   
## [4]   5.383891 10   
## [5]   9.305292  6   
## [6]  14.635083 10   
## [7]   5.765448  8   
## [8]   4.738369  9   
## [9]   1.850436 13   
## [10]  3.422156  8
  1. 將支持度調整為 0.005,信賴度調整為 0.7,並將所得之關聯規則命名為 rules02,得 121 條關聯規則。
rules02=apriori(cow,parameter=list(support=0.005,confidence=0.7))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.7    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 5 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [45 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [121 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules02         # 顯示 rules02 產生之關聯規則數目      
## set of 121 rules
inspect(rules02[1:10])       # 顯示其中前 10 條規則   
##      lhs                            rhs             support     confidence
## [1]  {麻油牛心}                  => {溫體牛肉3+1}   0.008748906 0.9090909 
## [2]  {牛腱}                      => {茼蒿}          0.008748906 0.9090909 
## [3]  {油菜炒牛肉}                => {牛肉貢丸(7粒)} 0.010498688 0.7500000 
## [4]  {牛肉貢丸(7粒), 炒牛雜麵}   => {溫體牛肉}      0.005249344 1.0000000 
## [5]  {炒牛雜麵, 溫體牛肉}        => {牛肉貢丸(7粒)} 0.005249344 0.8571429 
## [6]  {玉米筍, 綜合菇}            => {高麗菜}        0.005249344 1.0000000 
## [7]  {溫體牛肉3+1, 薑絲炒牛肉}   => {牛肉貢丸(7粒)} 0.005249344 0.8571429 
## [8]  {牛肉貢丸(7粒), 薑絲炒牛肉} => {溫體牛肉3+1}   0.005249344 1.0000000 
## [9]  {油菜炒牛肉, 溫體牛肉3+1}   => {牛肉貢丸(7粒)} 0.005249344 1.0000000 
## [10] {油菜炒牛肉, 高麗菜}        => {牛肉貢丸(7粒)} 0.005249344 0.7500000 
##      coverage    lift      count
## [1]  0.009623797  5.383891 10   
## [2]  0.009623797 14.635083 10   
## [3]  0.013998250  3.878959 12   
## [4]  0.005249344  4.038869  6   
## [5]  0.006124234  4.433096  6   
## [6]  0.005249344  6.844311  6   
## [7]  0.006124234  4.433096  6   
## [8]  0.005249344  5.922280  6   
## [9]  0.005249344  5.171946  6   
## [10] 0.006999125  3.878959  6
  1. 將支持度調整為 0.01,信賴度維持在 0.8,並將所得之關聯規則命名為 rules03,得 3 條關聯規則,coverage 也都達到0.013以上。
rules03=apriori(cow,parameter=list(support=0.01,confidence=0.8))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 11 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [40 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [3 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules03         # 顯示 rules03 產生之關聯規則數目      
## set of 3 rules
inspect(rules03)         # 觀察其內容
##     lhs                        rhs             support    confidence coverage  
## [1] {牛肉貢丸(7粒), 青江菜} => {綜合菇}        0.01312336 0.8333333  0.01574803
## [2] {加肉, 蒜頭炒牛肉}      => {溫體牛肉}      0.01137358 0.8125000  0.01399825
## [3] {高麗菜, 蒜頭炒牛肉}    => {牛肉貢丸(7粒)} 0.01137358 0.8666667  0.01312336
##     lift     count
## [1] 9.430693 15   
## [2] 3.281581 13   
## [3] 4.482353 13

此外,我們也可以透過將已形成的關聯規則 (例如 rules0) 之其中一個參數採取固定閾值,再依照其他參數來選擇前幾強的關聯規則,例如對於關聯規則 rules0 ,設定信賴度閾值為 0.5,並分別按 支持度、信賴度、提升度 排序,將結果計為 rules.sorted_supp、rules.sorted_conf、rules.sorted_lift,各選擇出前 6 強的關聯規則:

rules.sorted_supp = sort ( rules0, by="support" )   
inspect ( rules.sorted_supp [1:6] )    
##     lhs                      rhs             support    confidence coverage  
## [1] {加肉}                => {溫體牛肉}      0.06299213 0.5496183  0.11461067
## [2] {炒牛肉麵}            => {溫體牛肉湯}    0.04461942 0.5000000  0.08923885
## [3] {綜合菇}              => {高麗菜}        0.04461942 0.5049505  0.08836395
## [4] {茼蒿}                => {溫體牛肉鍋}    0.03324584 0.5352113  0.06211724
## [5] {高麗菜, 溫體牛肉3+1} => {牛肉貢丸(7粒)} 0.03324584 0.5588235  0.05949256
## [6] {牛肉貢丸(7粒), 加肉} => {溫體牛肉}      0.03149606 0.7826087  0.04024497
##     lift     count
## [1] 2.219837 72   
## [2] 1.565753 51   
## [3] 3.456038 51   
## [4] 5.014315 38   
## [5] 2.890205 38   
## [6] 3.160854 36
rules.sorted_conf = sort ( rules0, by="confidence" )   
inspect ( rules.sorted_conf [1:6] )    
##     lhs                     rhs            support     confidence coverage   
## [1] {三杯牛尾}           => {高麗菜炒牛肉} 0.001749781 1          0.001749781
## [2] {三菇炒牛肉, 牛筋湯} => {芥蘭炒牛肉}   0.001749781 1          0.001749781
## [3] {三菇炒牛肉, 牛筋湯} => {溫體牛肉湯}   0.001749781 1          0.001749781
## [4] {牛筋湯, 青椒炒牛肉} => {芥蘭炒牛肉}   0.001749781 1          0.001749781
## [5] {牛筋湯, 青椒炒牛肉} => {溫體牛肉湯}   0.001749781 1          0.001749781
## [6] {牛筋湯, 溫體牛肉湯} => {芥蘭炒牛肉}   0.003499563 1          0.003499563
##     lift      count
## [1] 10.297297 2    
## [2]  5.035242 2    
## [3]  3.131507 2    
## [4]  5.035242 2    
## [5]  3.131507 2    
## [6]  5.035242 4
rules.sorted_lift = sort ( rules0, by="lift" )   
inspect ( rules.sorted_lift [1:6] ) 
##     lhs                                     rhs              support    
## [1] {芥蘭炒牛肉, 青椒炒牛肉, 溫體牛肉湯} => {牛筋湯}         0.001749781
## [2] {金沙苦瓜牛肉, 茼蒿}                 => {溫體牛三層肉鍋} 0.001749781
## [3] {茼蒿, 高麗菜, 爆炒牛筋}             => {溫體牛三層肉鍋} 0.002624672
## [4] {高麗菜, 溫體牛肉, 爆炒牛筋}         => {溫體牛三層肉鍋} 0.002624672
## [5] {金沙苦瓜牛肉, 茼蒿, 高麗菜炒牛肉}   => {溫體牛三層肉鍋} 0.001749781
## [6] {金沙苦瓜牛肉, 娃娃菜, 茼蒿}         => {溫體牛三層肉鍋} 0.001749781
##     confidence coverage    lift     count
## [1] 0.6666667  0.002624672 108.8571 2    
## [2] 1.0000000  0.001749781 103.9091 2    
## [3] 1.0000000  0.002624672 103.9091 3    
## [4] 1.0000000  0.002624672 103.9091 3    
## [5] 1.0000000  0.001749781 103.9091 2    
## [6] 1.0000000  0.001749781 103.9091 2

看來在設定信賴度閾值為 0.5的條件下,只有按支持度選擇出的前 6 強關聯規則較具實際意義,因為 counts (消費訂單數) 皆達30以上。

接下來透過例子談談關聯規則的實際應用。我們常在賣場裡發現過兩種商品綑綁在一起銷售的情況,這也許是因為商家想要促銷其中的某種商品。假設店家現在想要促銷一款較冷門的商品:蒜頭炒牛肉:

先考慮支持度的要求0.005以上,信賴度的要求0.1以上的規則rules05

rules05=apriori(cow,parameter=list(maxlen=2,supp=0.005,conf=0.05),appearance=list(rhs="蒜頭炒牛肉",default="lhs"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.05    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target  ext
##       2  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 5 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [45 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2
## Warning in apriori(cow, parameter = list(maxlen = 2, supp = 0.005, conf =
## 0.05), : Mining stopped (maxlen reached). Only patterns up to a length of 2
## returned!
##  done [0.00s].
## writing ... [9 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
inspect(rules05)
##     lhs                rhs          support     confidence coverage   lift    
## [1] {炒牛雜}        => {蒜頭炒牛肉} 0.006999125 0.12698413 0.05511811 2.687831
## [2] {溫體牛肉鍋}    => {蒜頭炒牛肉} 0.006124234 0.05737705 0.10673666 1.214481
## [3] {綜合菇}        => {蒜頭炒牛肉} 0.006124234 0.06930693 0.08836395 1.466997
## [4] {加肉}          => {蒜頭炒牛肉} 0.013998250 0.12213740 0.11461067 2.585242
## [5] {娃娃菜}        => {蒜頭炒牛肉} 0.007874016 0.07563025 0.10411199 1.600840
## [6] {溫體牛肉3+1}   => {蒜頭炒牛肉} 0.009623797 0.05699482 0.16885389 1.206390
## [7] {高麗菜}        => {蒜頭炒牛肉} 0.013123360 0.08982036 0.14610674 1.901198
## [8] {牛肉貢丸(7粒)} => {蒜頭炒牛肉} 0.015748031 0.08144796 0.19335083 1.723982
## [9] {溫體牛肉}      => {蒜頭炒牛肉} 0.024496938 0.09893993 0.24759405 2.094229
##     count
## [1]  8   
## [2]  7   
## [3]  7   
## [4] 16   
## [5]  9   
## [6] 11   
## [7] 15   
## [8] 18   
## [9] 28

提高支持度的要求至0.01以上,信賴度的要求至0.1以上,關聯規則將限縮到只剩一項。

rules06=apriori(cow,parameter=list(maxlen=2,supp=0.01,conf=0.1),appearance=list(rhs="蒜頭炒牛肉",default="lhs"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.1    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target  ext
##       2  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 11 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [40 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2
## Warning in apriori(cow, parameter = list(maxlen = 2, supp = 0.01, conf = 0.1),
## : Mining stopped (maxlen reached). Only patterns up to a length of 2 returned!
##  done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
inspect(rules06)
##     lhs       rhs          support    confidence coverage  lift     count
## [1] {加肉} => {蒜頭炒牛肉} 0.01399825 0.1221374  0.1146107 2.585242 16

結果顯示 “加肉” 是 “蒜頭炒牛肉” 的 (最) 強關聯規則商品,因此可以考慮將這兩種產品綑綁在一起銷售。

假設店家現在想要促銷另一款較冷門的商品:青蔥炒牛肉

rules07=apriori(cow,parameter=list(maxlen=2,supp=0.005,conf=0.05),appearance=list(rhs="青蔥炒牛肉",default="lhs"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.05    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target  ext
##       2  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 5 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [45 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2
## Warning in apriori(cow, parameter = list(maxlen = 2, supp = 0.005, conf =
## 0.05), : Mining stopped (maxlen reached). Only patterns up to a length of 2
## returned!
##  done [0.00s].
## writing ... [5 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
inspect(rules07)
##     lhs             rhs          support     confidence coverage   lift    
## [1] {薑絲炒牛肚} => {青蔥炒牛肉} 0.005249344 0.13043478 0.04024497 3.172063
## [2] {青江菜}     => {青蔥炒牛肉} 0.005249344 0.14634146 0.03587052 3.558900
## [3] {炒牛肉麵}   => {青蔥炒牛肉} 0.006999125 0.07843137 0.08923885 1.907384
## [4] {加肉}       => {青蔥炒牛肉} 0.006124234 0.05343511 0.11461067 1.299497
## [5] {溫體牛肉湯} => {青蔥炒牛肉} 0.016622922 0.05205479 0.31933508 1.265928
##     count
## [1]  6   
## [2]  6   
## [3]  8   
## [4]  7   
## [5] 19

提高信賴度的要求至0.1以上,關聯規則限縮到只剩兩項。

rules08=apriori(cow,parameter=list(maxlen=2,supp=0.005,conf=0.10),appearance=list(rhs="青蔥炒牛肉",default="lhs"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.1    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target  ext
##       2  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 5 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [45 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2
## Warning in apriori(cow, parameter = list(maxlen = 2, supp = 0.005, conf = 0.1),
## : Mining stopped (maxlen reached). Only patterns up to a length of 2 returned!
##  done [0.00s].
## writing ... [2 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
inspect(rules08)
##     lhs             rhs          support     confidence coverage   lift    
## [1] {薑絲炒牛肚} => {青蔥炒牛肉} 0.005249344 0.1304348  0.04024497 3.172063
## [2] {青江菜}     => {青蔥炒牛肉} 0.005249344 0.1463415  0.03587052 3.558900
##     count
## [1] 6    
## [2] 6

結果顯示 “薑絲炒牛肚” 與 “青江菜” 是 “青蔥炒牛肉” 的極 強關聯規則商品,且此兩種產品對 “青蔥炒牛肉” 的支持度、信賴度、提升度 均非常接近,因此可以考慮與 “青蔥炒牛肉” 綑綁在一起銷售。

接下來我們採用 eclat 函數來獲取最適合進行綑綁銷售,或是鄰近擺放的商品品項:

elc_set=eclat(cow,parameter=list(minlen=1, maxlen=3, supp=0.005, target = "frequent itemsets"), control=list(sort=-1))
## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE   0.005      1      3 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -1    TRUE
## 
## Absolute minimum support count: 5 
## 
## create itemset ... 
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [45 item(s)] done [0.00s].
## creating sparse bit matrix ... [45 row(s), 1143 column(s)] done [0.00s].
## writing  ... [485 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].
inspect(elc_set[1:5])
##     items                      support     count
## [1] {麻油牛心, 溫體牛肉3+1}    0.008748906 10   
## [2] {牛腱, 茼蒿}               0.008748906 10   
## [3] {牛腱, 溫體牛肉鍋3+1}      0.005249344  6   
## [4] {溫體牛三層肉鍋, 溫體牛肉} 0.005249344  6   
## [5] {茼蒿, 溫體牛三層肉鍋}     0.006124234  7

調整參數

elc_set=eclat(cow,parameter=list(minlen=1, maxlen=3, supp=0.01, target = "frequent itemsets"), control=list(sort=-1))
## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE    0.01      1      3 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -1    TRUE
## 
## Absolute minimum support count: 11 
## 
## create itemset ... 
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [40 item(s)] done [0.00s].
## creating sparse bit matrix ... [40 row(s), 1143 column(s)] done [0.00s].
## writing  ... [206 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].
inspect(elc_set[1:5])
##     items                       support    count
## [1] {牛肉貢丸(7粒), 油菜炒牛肉} 0.01049869 12   
## [2] {冬粉, 凍豆腐}              0.01049869 12   
## [3] {洋蔥炒牛肉, 溫體牛肉湯}    0.01137358 13   
## [4] {三菇炒牛肉, 溫體牛肉湯}    0.01137358 13   
## [5] {牛三層肉, 溫體牛肉}        0.01224847 14

調整參數

elc_set=eclat(cow,parameter=list(minlen=1, maxlen=3, supp=0.07, target = "frequent itemsets"), control=list(sort=-1))
## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE    0.07      1      3 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -1    TRUE
## 
## Absolute minimum support count: 80 
## 
## create itemset ... 
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [13 item(s)] done [0.00s].
## creating bit matrix ... [13 row(s), 1143 column(s)] done [0.00s].
## writing  ... [15 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].
inspect(elc_set[1:5])
##     items                     support    count
## [1] {牛肉貢丸(7粒), 溫體牛肉} 0.08836395 101  
## [2] {芥蘭炒牛肉, 溫體牛肉湯}  0.07261592  83  
## [3] {溫體牛肉湯}              0.31933508 365  
## [4] {溫體牛肉}                0.24759405 283  
## [5] {芥蘭炒牛肉}              0.19860017 227

最終顯示最理想的兩種搭配銷售組合: {牛肉貢丸(7粒), 溫體牛肉}、{芥蘭炒牛肉, 溫體牛肉湯},其支持度最高,交易數也夠顯著。

關聯規則的視覺化

透過圖形的方式,可藉由視覺化方式,更直觀地顯示出關聯分析的結果。 這需要用到 R 的擴充套件 arulesViz

此處介紹一些此套件相關的簡單應用。

# install.packages("arulesViz", repos="http://cran.us.r-project.org")
library (arulesViz)  
data("cow")
## Warning in data("cow"): 沒有 'cow' 這個資料集

接下來將支持度設置為 0.002,信賴度設置為 0.5,並將所得之關聯規則命名為 rules09。

rules09 = apriori (cow, parameter = list ( support=0.002, confidence=0.5 ) )  
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5   0.002      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 2 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [46 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 done [0.00s].
## writing ... [2311 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules09
## set of 2311 rules

接下來對 rules09 繪製散佈圖:

plot(rules09)
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

圖中的每個點描述了相對應規則的支持度 (橫軸) 與信賴度 (縱軸),而顏色的深淺則由 lift (提升度) 值的高低來決定。可以透過參數設定,變更橫縱軸與顏色所對應的變量,例如:

plot(rules09, measure=c("support", "lift"), shading ="confidence")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

由於觀察圖形,無法確認感興趣的規則對應到的是哪些商品,這個問題可以透過設置互動參數來克服 (可互動部分無法以網頁呈現,可另外透過 R 程式來操作,以滑鼠點選之互動方式選擇特定關聯規則,程式在註解在下方程式區塊 # 後)

# plot(rules09, interactive=TRUE)

此外,我們還可以將 shading 參數設置為 “order” 來繪製出一種特殊的散佈圖- “two-key plot”,而顏色的深淺代表關聯規則中所含有的商品數目的多少,商品的種類 (order) 越多,點的顏色越深。

plot(rules09, shading="order", control=list(main = "Two-key plot"))
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

接下來我們將圖形類型變更為 “grouped” 來產生 “grouped matrix” 圖形,以 lift 參數來看,關聯性最強 (圓點顏色最深) 的兩組產品是 薑絲炒牛肚 及 牛筋 等產品與 牛腱 ,亦可透過 support 參數來看關聯性最強的組合 (圓點尺寸最大)。

plot(rules09, method= "grouped")

至於 method 參數,還可以修改為 “matrix”。“matrix3D”、“paracoord” 等:

以下圖形以 “matrix” 方法呈現,顯示了 rules09 的前 50 項關連規則中, 31 個 LHS (Left Hand Side) 與 13 個 RHS (Right Hand Side) 的支持度參數圖形化呈現。 其中顏色的深淺代表 lift 支持度參數的高低。

plot(rules09[1:50], method="matrix", measure="lift")
## Itemsets in Antecedent (LHS)
##  [1] "{高麗菜,溫體牛三層肉鍋}"   "{茼蒿,爆炒牛筋}"          
##  [3] "{高麗菜,爆炒牛筋}"         "{牛心,娃娃菜}"            
##  [5] "{牛心,高麗菜}"             "{牛腱}"                   
##  [7] "{冬粉}"                    "{溫體牛三層肉鍋,爆炒牛筋}"
##  [9] "{牛心,溫體牛肉}"           "{爆炒牛筋}"               
## [11] "{牛心}"                    "{溫體牛三層肉鍋}"         
## [13] "{白蘿蔔}"                  "{麻油牛心}"               
## [15] "{玉米筍}"                  "{牛筋湯,溫體牛肉湯}"      
## [17] "{茼蒿}"                    "{牛心,青江菜}"            
## [19] "{凍豆腐}"                  "{青江菜}"                 
## [21] "{玉米}"                    "{綜合菇}"                 
## [23] "{油菜炒牛肉}"              "{牛雜}"                   
## [25] "{牛筋湯}"                  "{牛筋湯,芥蘭炒牛肉}"      
## [27] "{加肉}"                    "{蒜頭炒牛肉}"             
## [29] "{洋蔥炒牛肉}"              "{三菇炒牛肉}"             
## [31] "{炒牛肉麵}"               
## Itemsets in Consequent (RHS)
##  [1] "{溫體牛肉湯}"     "{溫體牛肉}"       "{牛肉貢丸(7粒)}"  "{芥蘭炒牛肉}"    
##  [5] "{高麗菜}"         "{溫體牛肉鍋}"     "{溫體牛肉3+1}"    "{綜合菇}"        
##  [9] "{娃娃菜}"         "{溫體牛肉鍋3+1}"  "{茼蒿}"           "{凍豆腐}"        
## [13] "{青江菜}"         "{溫體牛三層肉鍋}" "{爆炒牛筋}"

rules09 的前 50 項關連規則詳細內容如下:

inspect(rules09[1:50])
##      lhs                           rhs              support     confidence
## [1]  {牛筋湯}                   => {芥蘭炒牛肉}     0.004374453 0.7142857 
## [2]  {牛筋湯}                   => {溫體牛肉湯}     0.003499563 0.5714286 
## [3]  {牛心}                     => {青江菜}         0.004374453 0.7142857 
## [4]  {牛心}                     => {娃娃菜}         0.003499563 0.5714286 
## [5]  {牛心}                     => {高麗菜}         0.004374453 0.7142857 
## [6]  {牛心}                     => {溫體牛肉}       0.004374453 0.7142857 
## [7]  {溫體牛三層肉鍋}           => {茼蒿}           0.006124234 0.6363636 
## [8]  {溫體牛三層肉鍋}           => {溫體牛肉}       0.005249344 0.5454545 
## [9]  {麻油牛心}                 => {溫體牛肉3+1}    0.008748906 0.9090909 
## [10] {爆炒牛筋}                 => {溫體牛肉鍋3+1}  0.005249344 0.5000000 
## [11] {牛腱}                     => {溫體牛肉鍋3+1}  0.005249344 0.5454545 
## [12] {牛腱}                     => {茼蒿}           0.008748906 0.9090909 
## [13] {玉米筍}                   => {溫體牛肉鍋}     0.006999125 0.6153846 
## [14] {玉米筍}                   => {高麗菜}         0.007874016 0.6923077 
## [15] {洋蔥炒牛肉}               => {溫體牛肉湯}     0.011373578 0.5909091 
## [16] {油菜炒牛肉}               => {高麗菜}         0.006999125 0.5000000 
## [17] {油菜炒牛肉}               => {牛肉貢丸(7粒)}  0.010498688 0.7500000 
## [18] {油菜炒牛肉}               => {溫體牛肉}       0.007874016 0.5625000 
## [19] {白蘿蔔}                   => {溫體牛肉鍋3+1}  0.006124234 0.5384615 
## [20] {白蘿蔔}                   => {牛肉貢丸(7粒)}  0.006124234 0.5384615 
## [21] {三菇炒牛肉}               => {溫體牛肉湯}     0.011373578 0.5200000 
## [22] {冬粉}                     => {凍豆腐}         0.010498688 0.5454545 
## [23] {冬粉}                     => {溫體牛肉鍋3+1}  0.009623797 0.5000000 
## [24] {牛雜}                     => {溫體牛肉}       0.021872266 0.6756757 
## [25] {凍豆腐}                   => {溫體牛肉鍋}     0.018372703 0.5121951 
## [26] {青江菜}                   => {綜合菇}         0.019247594 0.5365854 
## [27] {青江菜}                   => {溫體牛肉}       0.022747157 0.6341463 
## [28] {蒜頭炒牛肉}               => {溫體牛肉}       0.024496938 0.5185185 
## [29] {炒牛肉麵}                 => {溫體牛肉湯}     0.044619423 0.5000000 
## [30] {茼蒿}                     => {溫體牛肉鍋}     0.033245844 0.5352113 
## [31] {玉米}                     => {高麗菜}         0.030621172 0.5303030 
## [32] {綜合菇}                   => {高麗菜}         0.044619423 0.5049505 
## [33] {加肉}                     => {溫體牛肉}       0.062992126 0.5496183 
## [34] {牛筋湯, 芥蘭炒牛肉}       => {溫體牛肉湯}     0.003499563 0.8000000 
## [35] {牛筋湯, 溫體牛肉湯}       => {芥蘭炒牛肉}     0.003499563 1.0000000 
## [36] {牛心, 青江菜}             => {娃娃菜}         0.002624672 0.6000000 
## [37] {牛心, 娃娃菜}             => {青江菜}         0.002624672 0.7500000 
## [38] {牛心, 青江菜}             => {高麗菜}         0.003499563 0.8000000 
## [39] {牛心, 高麗菜}             => {青江菜}         0.003499563 0.8000000 
## [40] {牛心, 青江菜}             => {溫體牛肉}       0.003499563 0.8000000 
## [41] {牛心, 溫體牛肉}           => {青江菜}         0.003499563 0.8000000 
## [42] {牛心, 娃娃菜}             => {溫體牛肉}       0.003499563 1.0000000 
## [43] {牛心, 溫體牛肉}           => {娃娃菜}         0.003499563 0.8000000 
## [44] {牛心, 高麗菜}             => {溫體牛肉}       0.002624672 0.6000000 
## [45] {牛心, 溫體牛肉}           => {高麗菜}         0.002624672 0.6000000 
## [46] {溫體牛三層肉鍋, 爆炒牛筋} => {茼蒿}           0.002624672 1.0000000 
## [47] {茼蒿, 爆炒牛筋}           => {溫體牛三層肉鍋} 0.002624672 0.7500000 
## [48] {溫體牛三層肉鍋, 爆炒牛筋} => {高麗菜}         0.002624672 1.0000000 
## [49] {高麗菜, 溫體牛三層肉鍋}   => {爆炒牛筋}       0.002624672 1.0000000 
## [50] {高麗菜, 爆炒牛筋}         => {溫體牛三層肉鍋} 0.002624672 0.7500000 
##      coverage    lift      count
## [1]  0.006124234  3.596602  5   
## [2]  0.006124234  1.789432  4   
## [3]  0.006124234 19.912892  5   
## [4]  0.006124234  5.488595  4   
## [5]  0.006124234  4.888794  5   
## [6]  0.006124234  2.884907  5   
## [7]  0.009623797 10.244558  7   
## [8]  0.009623797  2.203020  6   
## [9]  0.009623797  5.383891 10   
## [10] 0.010498688  8.529851  6   
## [11] 0.009623797  9.305292  6   
## [12] 0.009623797 14.635083 10   
## [13] 0.011373578  5.765448  8   
## [14] 0.011373578  4.738369  9   
## [15] 0.019247594  1.850436 13   
## [16] 0.013998250  3.422156  8   
## [17] 0.013998250  3.878959 12   
## [18] 0.013998250  2.271864  9   
## [19] 0.011373578  9.185993  7   
## [20] 0.011373578  2.784894  7   
## [21] 0.021872266  1.628384 13   
## [22] 0.019247594 15.206208 12   
## [23] 0.019247594  8.529851 11   
## [24] 0.032370954  2.728966 25   
## [25] 0.035870516  4.798681 21   
## [26] 0.035870516  6.072446 22   
## [27] 0.035870516  2.561234 26   
## [28] 0.047244094  2.094229 28   
## [29] 0.089238845  1.565753 51   
## [30] 0.062117235  5.014315 38   
## [31] 0.057742782  3.629559 35   
## [32] 0.088363955  3.456038 51   
## [33] 0.114610674  2.219837 72   
## [34] 0.004374453  2.505205  4   
## [35] 0.003499563  5.035242  4   
## [36] 0.004374453  5.763025  3   
## [37] 0.003499563 20.908537  3   
## [38] 0.004374453  5.475449  4   
## [39] 0.004374453 22.302439  4   
## [40] 0.004374453  3.231095  4   
## [41] 0.004374453 22.302439  4   
## [42] 0.003499563  4.038869  4   
## [43] 0.004374453  7.684034  4   
## [44] 0.004374453  2.423322  3   
## [45] 0.004374453  4.106587  3   
## [46] 0.002624672 16.098592  3   
## [47] 0.003499563 77.931818  3   
## [48] 0.002624672  6.844311  3   
## [49] 0.002624672 95.250000  3   
## [50] 0.003499563 77.931818  3