購物籃分析又稱關聯分析,從大量的交易資料中探勘資料間具有相關性的隱藏商業規則,其中最經典的就是啤酒與尿布的例子。 以下透過 R 針對台南牛肉湯之店面銷售資料進行簡單範例的操作與說明(資料更新至2024/04)。
首先安裝與載入 arules 套件
# install.packages("arules", repos="http://cran.us.r-project.org")
library("arules")
## 載入需要的套件:Matrix
##
## 載入套件:'arules'
## 下列物件被遮斷自 'package:base':
##
## abbreviate, write
接下來讀取店面銷售資料,使用 summary 指令,獲取其摘要訊息。
cow <- read.transactions("cow.csv")
summary(cow)
## transactions as itemMatrix in sparse format with
## 1143 rows (elements/itemsets/transactions) and
## 50 columns (items) and a density of 0.05989501
##
## most frequent items:
## 溫體牛肉湯 溫體牛肉 芥蘭炒牛肉 牛肉貢丸(7粒) 滑蛋炒牛肉
## 365 283 227 221 200
## (Other)
## 2127
##
## element (itemset/transaction) length distribution:
## sizes
## 0 1 2 3 4 5 6 7 8 9 11
## 30 189 311 233 172 106 55 29 8 6 4
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 2.000 3.000 2.995 4.000 11.000
##
## includes extended item information - examples:
## labels
## 1 三杯牛尾
## 2 三菇炒牛肉
## 3 牛三層肉
展示訂單資料的前八條交易資料:
inspect(cow[1:8])
## items
## [1] {青椒炒牛肉, 綜合湯}
## [2] {牛肉貢丸(7粒), 加肉, 玉米, 娃娃菜, 高麗菜, 溫體牛肉3+1, 滑蛋炒牛肉}
## [3] {芥蘭炒牛肉, 溫體牛肉湯, 滑蛋炒牛肉}
## [4] {溫體牛肉湯}
## [5] {溫體牛肉湯}
## [6] {三菇炒牛肉, 溫體牛肉湯}
## [7] {溫體牛肉湯, 滑蛋炒牛肉}
## [8] {溫體牛肉湯, 滑蛋炒牛肉, 綜合湯}
除了看商品品項之外,我們也可以使用 size 函數,來看單次購買的品項有幾種商品。
size(cow[1:8])
## [1] 2 7 3 1 1 2 2 3
然後我們再利用 itemFrequency 函數計算以列出每一項品項佔的比例,藉此也可以找出出現頻率比較高的商品。
itemFrequency(cow)
## 三杯牛尾 三菇炒牛肉 牛三層肉 牛三層肉湯 牛心
## 0.0017497813 0.0218722660 0.0262467192 0.0489938758 0.0061242345
## 牛心湯 牛肉貢丸(7粒) 牛骨隨 牛筋 牛筋湯
## 0.0026246719 0.1933508311 0.0131233596 0.0454943132 0.0061242345
## 牛腱 牛雜 牛雜湯 牛鞭 冬粉
## 0.0096237970 0.0323709536 0.0306211724 0.0008748906 0.0192475941
## 加肉 玉米 玉米筍 白蘿蔔 油菜炒牛肉
## 0.1146106737 0.0577427822 0.0113735783 0.0113735783 0.0139982502
## 炒牛肉麵 炒牛雜 炒牛雜麵 芥蘭炒牛肉 金沙苦瓜牛肉
## 0.0892388451 0.0551181102 0.0183727034 0.1986001750 0.0244969379
## 青江菜 青椒炒牛肉 青蔥炒牛肉 娃娃菜 洋蔥炒牛肉
## 0.0358705162 0.0454943132 0.0411198600 0.1041119860 0.0192475941
## 凍豆腐 茼蒿 高麗菜 高麗菜炒牛肉 麻油牛心
## 0.0358705162 0.0621172353 0.1461067367 0.0971128609 0.0096237970
## 麻油牛腦 溫體牛三層肉鍋 溫體牛肉 溫體牛肉3+1 溫體牛肉湯
## 0.0008748906 0.0096237970 0.2475940507 0.1688538933 0.3193350831
## 溫體牛肉鍋 溫體牛肉鍋3+1 滑蛋炒牛肉 綜合湯 綜合菇
## 0.1067366579 0.0586176728 0.1749781277 0.0533683290 0.0883639545
## 蒜頭炒牛肉 蔥炒牛肉 薑絲炒牛肉 薑絲炒牛肚 爆炒牛筋
## 0.0472440945 0.0008748906 0.0174978128 0.0402449694 0.0104986877
用 itemFrequencyPlot 指令繪出 Top10 產品。
itemFrequencyPlot(cow,topN = 10)
itemFrequencyPlot(cow,topN = 10,type = "absolute")
itemFrequencyPlot(cow,topN = 10,horiz = T,
main = "Item Frequency",xlab = "Relative Frequency")
顯示比例至少高於 0.1 產品所佔的比例圖, support 參數是支持度的意思 (通常會默認是 0.1,如果不使用的話,將列出所有產品品項,畫面會很亂。)
itemFrequencyPlot(cow,support = 0.1,
main = "Item Frequency with S = 0.1",ylab = "Relative Frequency")
接下來,我們嘗試運用 apriori (先驗演算法) 及 eclat (等價類變換演算法,Equivalence CLAss Transformation, Eclat) 函數 ,看看是否可以從資料中發掘一些有趣的結論:
apriori 演算法大致的運作方式,是首先透過設定 support 以及 confidence 兩個參數,再進一步觀察第三個參數 lift 與第四個參數 coverage:
支持度 (support):「規則」在資料內具有的普遍性,也就是這些 A 跟 B 同時出現的機率多少。 \(Support(A \longrightarrow B) = P(A,B)\)
信賴度 (confidence):「規則」要有一定的信心水準,也就是當購買 A 狀態下,也會購買 B 的條件機率。\(confidence(A \longrightarrow B) = P(B|A)=P(A,B)/P(A)\)
提升度 (lift):「規則」對於特定商品的存在要有一定的提升效果,也就是在產品 B 出現的可能基礎 \(P(B)\) 上,購買 A 狀態下也會購買 B 的條件機率 \(P(B|A)\) ( A 出現的前提下 B 的出現率) 的提升程度。\(lift(A \longrightarrow B) = P(B|A)/P(B) = confidence(A \longrightarrow B) / P(B)\)
覆蓋度 (coverage):「規則」對於特定商品的存在機率是否達到一定的標準,影響了關聯規則的適用性,又稱 LHS-support。也就是 A 產品的購買比例。 \(coverage(A) = P(A)\)
首先,嘗試對 apriori 函數做最少的限制,再依結果來決定該如何調整。支持度的最小閾值暫設定為 0.001 ,信賴度的最小閾值暫設定為 0.5 ,其他參數暫時不設定採預設值,並將所得之關聯規則命名為 rules0:
rules0=apriori(cow,parameter=list(support=0.001,confidence=0.5))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 1
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [47 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10
## Warning in apriori(cow, parameter = list(support = 0.001, confidence = 0.5)):
## Mining stopped (maxlen reached). Only patterns up to a length of 10 returned!
## done [0.00s].
## writing ... [15503 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
結果顯示 支持度、信賴度 最小值的參數詳解 (parameter specification) 以及計錄演算法執行過程中相關參數的演算法控制 (algorithmic control) 以及若干執行細節。
rules0 # 顯示 rules0 產生之關聯規則數目
## set of 15503 rules
可看出 rules0 中共包含 15503 條關聯規則,完整顯示這 15503 條關聯規則並沒有太大的意義。而且,透過觀察前 10 條規則,我們發現關聯規則的先後順序與其關聯強度的四個參數值 (support、confidence、lift、coverage) 的取值大小沒有明顯的關聯。
inspect(rules0[1:10]) # 顯示rules0其中前 10 條規則
## lhs rhs support confidence coverage
## [1] {三杯牛尾} => {高麗菜炒牛肉} 0.001749781 1.0000000 0.001749781
## [2] {牛心湯} => {溫體牛肉湯} 0.001749781 0.6666667 0.002624672
## [3] {牛筋湯} => {芥蘭炒牛肉} 0.004374453 0.7142857 0.006124234
## [4] {牛筋湯} => {溫體牛肉湯} 0.003499563 0.5714286 0.006124234
## [5] {牛心} => {青江菜} 0.004374453 0.7142857 0.006124234
## [6] {牛心} => {娃娃菜} 0.003499563 0.5714286 0.006124234
## [7] {牛心} => {高麗菜} 0.004374453 0.7142857 0.006124234
## [8] {牛心} => {溫體牛肉} 0.004374453 0.7142857 0.006124234
## [9] {溫體牛三層肉鍋} => {茼蒿} 0.006124234 0.6363636 0.009623797
## [10] {溫體牛三層肉鍋} => {溫體牛肉} 0.005249344 0.5454545 0.009623797
## lift count
## [1] 10.297297 2
## [2] 2.087671 2
## [3] 3.596602 5
## [4] 1.789432 4
## [5] 19.912892 5
## [6] 5.488595 4
## [7] 4.888794 5
## [8] 2.884907 5
## [9] 10.244558 7
## [10] 2.203020 6
由於訂單資料尚在累積中,交易資料有限,因此可看出前 10 條規則有超過一半都不高於五筆交易,顯然不是理想的關聯規則,coverage偏低。
面對複雜混亂的大量訊息,較好的方法是針對生成規則進行強度控制,以生成關聯性較強的若干重要規則,以下透過幾種嘗試進行參數調整:
rules01=apriori(cow,parameter=list(support=0.005,confidence=0.5))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 5
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [45 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [314 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules01 # 顯示 rules01 產生之關聯規則數目
## set of 314 rules
inspect(rules01[1:10]) # 顯示其中前 10 條規則
## lhs rhs support confidence coverage
## [1] {溫體牛三層肉鍋} => {茼蒿} 0.006124234 0.6363636 0.009623797
## [2] {溫體牛三層肉鍋} => {溫體牛肉} 0.005249344 0.5454545 0.009623797
## [3] {爆炒牛筋} => {溫體牛肉鍋3+1} 0.005249344 0.5000000 0.010498688
## [4] {麻油牛心} => {溫體牛肉3+1} 0.008748906 0.9090909 0.009623797
## [5] {牛腱} => {溫體牛肉鍋3+1} 0.005249344 0.5454545 0.009623797
## [6] {牛腱} => {茼蒿} 0.008748906 0.9090909 0.009623797
## [7] {玉米筍} => {溫體牛肉鍋} 0.006999125 0.6153846 0.011373578
## [8] {玉米筍} => {高麗菜} 0.007874016 0.6923077 0.011373578
## [9] {洋蔥炒牛肉} => {溫體牛肉湯} 0.011373578 0.5909091 0.019247594
## [10] {油菜炒牛肉} => {高麗菜} 0.006999125 0.5000000 0.013998250
## lift count
## [1] 10.244558 7
## [2] 2.203020 6
## [3] 8.529851 6
## [4] 5.383891 10
## [5] 9.305292 6
## [6] 14.635083 10
## [7] 5.765448 8
## [8] 4.738369 9
## [9] 1.850436 13
## [10] 3.422156 8
rules02=apriori(cow,parameter=list(support=0.005,confidence=0.7))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.7 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 5
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [45 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [121 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules02 # 顯示 rules02 產生之關聯規則數目
## set of 121 rules
inspect(rules02[1:10]) # 顯示其中前 10 條規則
## lhs rhs support confidence
## [1] {麻油牛心} => {溫體牛肉3+1} 0.008748906 0.9090909
## [2] {牛腱} => {茼蒿} 0.008748906 0.9090909
## [3] {油菜炒牛肉} => {牛肉貢丸(7粒)} 0.010498688 0.7500000
## [4] {牛肉貢丸(7粒), 炒牛雜麵} => {溫體牛肉} 0.005249344 1.0000000
## [5] {炒牛雜麵, 溫體牛肉} => {牛肉貢丸(7粒)} 0.005249344 0.8571429
## [6] {玉米筍, 綜合菇} => {高麗菜} 0.005249344 1.0000000
## [7] {溫體牛肉3+1, 薑絲炒牛肉} => {牛肉貢丸(7粒)} 0.005249344 0.8571429
## [8] {牛肉貢丸(7粒), 薑絲炒牛肉} => {溫體牛肉3+1} 0.005249344 1.0000000
## [9] {油菜炒牛肉, 溫體牛肉3+1} => {牛肉貢丸(7粒)} 0.005249344 1.0000000
## [10] {油菜炒牛肉, 高麗菜} => {牛肉貢丸(7粒)} 0.005249344 0.7500000
## coverage lift count
## [1] 0.009623797 5.383891 10
## [2] 0.009623797 14.635083 10
## [3] 0.013998250 3.878959 12
## [4] 0.005249344 4.038869 6
## [5] 0.006124234 4.433096 6
## [6] 0.005249344 6.844311 6
## [7] 0.006124234 4.433096 6
## [8] 0.005249344 5.922280 6
## [9] 0.005249344 5.171946 6
## [10] 0.006999125 3.878959 6
rules03=apriori(cow,parameter=list(support=0.01,confidence=0.8))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 11
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [40 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [3 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules03 # 顯示 rules03 產生之關聯規則數目
## set of 3 rules
inspect(rules03) # 觀察其內容
## lhs rhs support confidence coverage
## [1] {牛肉貢丸(7粒), 青江菜} => {綜合菇} 0.01312336 0.8333333 0.01574803
## [2] {加肉, 蒜頭炒牛肉} => {溫體牛肉} 0.01137358 0.8125000 0.01399825
## [3] {高麗菜, 蒜頭炒牛肉} => {牛肉貢丸(7粒)} 0.01137358 0.8666667 0.01312336
## lift count
## [1] 9.430693 15
## [2] 3.281581 13
## [3] 4.482353 13
此外,我們也可以透過將已形成的關聯規則 (例如 rules0) 之其中一個參數採取固定閾值,再依照其他參數來選擇前幾強的關聯規則,例如對於關聯規則 rules0 ,設定信賴度閾值為 0.5,並分別按 支持度、信賴度、提升度 排序,將結果計為 rules.sorted_supp、rules.sorted_conf、rules.sorted_lift,各選擇出前 6 強的關聯規則:
rules.sorted_supp = sort ( rules0, by="support" )
inspect ( rules.sorted_supp [1:6] )
## lhs rhs support confidence coverage
## [1] {加肉} => {溫體牛肉} 0.06299213 0.5496183 0.11461067
## [2] {炒牛肉麵} => {溫體牛肉湯} 0.04461942 0.5000000 0.08923885
## [3] {綜合菇} => {高麗菜} 0.04461942 0.5049505 0.08836395
## [4] {茼蒿} => {溫體牛肉鍋} 0.03324584 0.5352113 0.06211724
## [5] {高麗菜, 溫體牛肉3+1} => {牛肉貢丸(7粒)} 0.03324584 0.5588235 0.05949256
## [6] {牛肉貢丸(7粒), 加肉} => {溫體牛肉} 0.03149606 0.7826087 0.04024497
## lift count
## [1] 2.219837 72
## [2] 1.565753 51
## [3] 3.456038 51
## [4] 5.014315 38
## [5] 2.890205 38
## [6] 3.160854 36
rules.sorted_conf = sort ( rules0, by="confidence" )
inspect ( rules.sorted_conf [1:6] )
## lhs rhs support confidence coverage
## [1] {三杯牛尾} => {高麗菜炒牛肉} 0.001749781 1 0.001749781
## [2] {三菇炒牛肉, 牛筋湯} => {芥蘭炒牛肉} 0.001749781 1 0.001749781
## [3] {三菇炒牛肉, 牛筋湯} => {溫體牛肉湯} 0.001749781 1 0.001749781
## [4] {牛筋湯, 青椒炒牛肉} => {芥蘭炒牛肉} 0.001749781 1 0.001749781
## [5] {牛筋湯, 青椒炒牛肉} => {溫體牛肉湯} 0.001749781 1 0.001749781
## [6] {牛筋湯, 溫體牛肉湯} => {芥蘭炒牛肉} 0.003499563 1 0.003499563
## lift count
## [1] 10.297297 2
## [2] 5.035242 2
## [3] 3.131507 2
## [4] 5.035242 2
## [5] 3.131507 2
## [6] 5.035242 4
rules.sorted_lift = sort ( rules0, by="lift" )
inspect ( rules.sorted_lift [1:6] )
## lhs rhs support
## [1] {芥蘭炒牛肉, 青椒炒牛肉, 溫體牛肉湯} => {牛筋湯} 0.001749781
## [2] {金沙苦瓜牛肉, 茼蒿} => {溫體牛三層肉鍋} 0.001749781
## [3] {茼蒿, 高麗菜, 爆炒牛筋} => {溫體牛三層肉鍋} 0.002624672
## [4] {高麗菜, 溫體牛肉, 爆炒牛筋} => {溫體牛三層肉鍋} 0.002624672
## [5] {金沙苦瓜牛肉, 茼蒿, 高麗菜炒牛肉} => {溫體牛三層肉鍋} 0.001749781
## [6] {金沙苦瓜牛肉, 娃娃菜, 茼蒿} => {溫體牛三層肉鍋} 0.001749781
## confidence coverage lift count
## [1] 0.6666667 0.002624672 108.8571 2
## [2] 1.0000000 0.001749781 103.9091 2
## [3] 1.0000000 0.002624672 103.9091 3
## [4] 1.0000000 0.002624672 103.9091 3
## [5] 1.0000000 0.001749781 103.9091 2
## [6] 1.0000000 0.001749781 103.9091 2
看來在設定信賴度閾值為 0.5的條件下,只有按支持度選擇出的前 6 強關聯規則較具實際意義,因為 counts (消費訂單數) 皆達30以上。
接下來透過例子談談關聯規則的實際應用。我們常在賣場裡發現過兩種商品綑綁在一起銷售的情況,這也許是因為商家想要促銷其中的某種商品。假設店家現在想要促銷一款較冷門的商品:蒜頭炒牛肉:
先考慮支持度的要求0.005以上,信賴度的要求0.1以上的規則rules05
rules05=apriori(cow,parameter=list(maxlen=2,supp=0.005,conf=0.05),appearance=list(rhs="蒜頭炒牛肉",default="lhs"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.05 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 2 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 5
##
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [45 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2
## Warning in apriori(cow, parameter = list(maxlen = 2, supp = 0.005, conf =
## 0.05), : Mining stopped (maxlen reached). Only patterns up to a length of 2
## returned!
## done [0.00s].
## writing ... [9 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
inspect(rules05)
## lhs rhs support confidence coverage lift
## [1] {炒牛雜} => {蒜頭炒牛肉} 0.006999125 0.12698413 0.05511811 2.687831
## [2] {溫體牛肉鍋} => {蒜頭炒牛肉} 0.006124234 0.05737705 0.10673666 1.214481
## [3] {綜合菇} => {蒜頭炒牛肉} 0.006124234 0.06930693 0.08836395 1.466997
## [4] {加肉} => {蒜頭炒牛肉} 0.013998250 0.12213740 0.11461067 2.585242
## [5] {娃娃菜} => {蒜頭炒牛肉} 0.007874016 0.07563025 0.10411199 1.600840
## [6] {溫體牛肉3+1} => {蒜頭炒牛肉} 0.009623797 0.05699482 0.16885389 1.206390
## [7] {高麗菜} => {蒜頭炒牛肉} 0.013123360 0.08982036 0.14610674 1.901198
## [8] {牛肉貢丸(7粒)} => {蒜頭炒牛肉} 0.015748031 0.08144796 0.19335083 1.723982
## [9] {溫體牛肉} => {蒜頭炒牛肉} 0.024496938 0.09893993 0.24759405 2.094229
## count
## [1] 8
## [2] 7
## [3] 7
## [4] 16
## [5] 9
## [6] 11
## [7] 15
## [8] 18
## [9] 28
提高支持度的要求至0.01以上,信賴度的要求至0.1以上,關聯規則將限縮到只剩一項。
rules06=apriori(cow,parameter=list(maxlen=2,supp=0.01,conf=0.1),appearance=list(rhs="蒜頭炒牛肉",default="lhs"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.1 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 2 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 11
##
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [40 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2
## Warning in apriori(cow, parameter = list(maxlen = 2, supp = 0.01, conf = 0.1),
## : Mining stopped (maxlen reached). Only patterns up to a length of 2 returned!
## done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
inspect(rules06)
## lhs rhs support confidence coverage lift count
## [1] {加肉} => {蒜頭炒牛肉} 0.01399825 0.1221374 0.1146107 2.585242 16
結果顯示 “加肉” 是 “蒜頭炒牛肉” 的 (最) 強關聯規則商品,因此可以考慮將這兩種產品綑綁在一起銷售。
假設店家現在想要促銷另一款較冷門的商品:青蔥炒牛肉
rules07=apriori(cow,parameter=list(maxlen=2,supp=0.005,conf=0.05),appearance=list(rhs="青蔥炒牛肉",default="lhs"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.05 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 2 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 5
##
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [45 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2
## Warning in apriori(cow, parameter = list(maxlen = 2, supp = 0.005, conf =
## 0.05), : Mining stopped (maxlen reached). Only patterns up to a length of 2
## returned!
## done [0.00s].
## writing ... [5 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
inspect(rules07)
## lhs rhs support confidence coverage lift
## [1] {薑絲炒牛肚} => {青蔥炒牛肉} 0.005249344 0.13043478 0.04024497 3.172063
## [2] {青江菜} => {青蔥炒牛肉} 0.005249344 0.14634146 0.03587052 3.558900
## [3] {炒牛肉麵} => {青蔥炒牛肉} 0.006999125 0.07843137 0.08923885 1.907384
## [4] {加肉} => {青蔥炒牛肉} 0.006124234 0.05343511 0.11461067 1.299497
## [5] {溫體牛肉湯} => {青蔥炒牛肉} 0.016622922 0.05205479 0.31933508 1.265928
## count
## [1] 6
## [2] 6
## [3] 8
## [4] 7
## [5] 19
提高信賴度的要求至0.1以上,關聯規則限縮到只剩兩項。
rules08=apriori(cow,parameter=list(maxlen=2,supp=0.005,conf=0.10),appearance=list(rhs="青蔥炒牛肉",default="lhs"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.1 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 2 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 5
##
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [45 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2
## Warning in apriori(cow, parameter = list(maxlen = 2, supp = 0.005, conf = 0.1),
## : Mining stopped (maxlen reached). Only patterns up to a length of 2 returned!
## done [0.00s].
## writing ... [2 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
inspect(rules08)
## lhs rhs support confidence coverage lift
## [1] {薑絲炒牛肚} => {青蔥炒牛肉} 0.005249344 0.1304348 0.04024497 3.172063
## [2] {青江菜} => {青蔥炒牛肉} 0.005249344 0.1463415 0.03587052 3.558900
## count
## [1] 6
## [2] 6
結果顯示 “薑絲炒牛肚” 與 “青江菜” 是 “青蔥炒牛肉” 的極 強關聯規則商品,且此兩種產品對 “青蔥炒牛肉” 的支持度、信賴度、提升度 均非常接近,因此可以考慮與 “青蔥炒牛肉” 綑綁在一起銷售。
接下來我們採用 eclat 函數來獲取最適合進行綑綁銷售,或是鄰近擺放的商品品項:
elc_set=eclat(cow,parameter=list(minlen=1, maxlen=3, supp=0.005, target = "frequent itemsets"), control=list(sort=-1))
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.005 1 3 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -1 TRUE
##
## Absolute minimum support count: 5
##
## create itemset ...
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [45 item(s)] done [0.00s].
## creating sparse bit matrix ... [45 row(s), 1143 column(s)] done [0.00s].
## writing ... [485 set(s)] done [0.00s].
## Creating S4 object ... done [0.00s].
inspect(elc_set[1:5])
## items support count
## [1] {麻油牛心, 溫體牛肉3+1} 0.008748906 10
## [2] {牛腱, 茼蒿} 0.008748906 10
## [3] {牛腱, 溫體牛肉鍋3+1} 0.005249344 6
## [4] {溫體牛三層肉鍋, 溫體牛肉} 0.005249344 6
## [5] {茼蒿, 溫體牛三層肉鍋} 0.006124234 7
調整參數
elc_set=eclat(cow,parameter=list(minlen=1, maxlen=3, supp=0.01, target = "frequent itemsets"), control=list(sort=-1))
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.01 1 3 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -1 TRUE
##
## Absolute minimum support count: 11
##
## create itemset ...
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [40 item(s)] done [0.00s].
## creating sparse bit matrix ... [40 row(s), 1143 column(s)] done [0.00s].
## writing ... [206 set(s)] done [0.00s].
## Creating S4 object ... done [0.00s].
inspect(elc_set[1:5])
## items support count
## [1] {牛肉貢丸(7粒), 油菜炒牛肉} 0.01049869 12
## [2] {冬粉, 凍豆腐} 0.01049869 12
## [3] {洋蔥炒牛肉, 溫體牛肉湯} 0.01137358 13
## [4] {三菇炒牛肉, 溫體牛肉湯} 0.01137358 13
## [5] {牛三層肉, 溫體牛肉} 0.01224847 14
調整參數
elc_set=eclat(cow,parameter=list(minlen=1, maxlen=3, supp=0.07, target = "frequent itemsets"), control=list(sort=-1))
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.07 1 3 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -1 TRUE
##
## Absolute minimum support count: 80
##
## create itemset ...
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [13 item(s)] done [0.00s].
## creating bit matrix ... [13 row(s), 1143 column(s)] done [0.00s].
## writing ... [15 set(s)] done [0.00s].
## Creating S4 object ... done [0.00s].
inspect(elc_set[1:5])
## items support count
## [1] {牛肉貢丸(7粒), 溫體牛肉} 0.08836395 101
## [2] {芥蘭炒牛肉, 溫體牛肉湯} 0.07261592 83
## [3] {溫體牛肉湯} 0.31933508 365
## [4] {溫體牛肉} 0.24759405 283
## [5] {芥蘭炒牛肉} 0.19860017 227
最終顯示最理想的兩種搭配銷售組合: {牛肉貢丸(7粒), 溫體牛肉}、{芥蘭炒牛肉, 溫體牛肉湯},其支持度最高,交易數也夠顯著。
透過圖形的方式,可藉由視覺化方式,更直觀地顯示出關聯分析的結果。 這需要用到 R 的擴充套件 arulesViz。
此處介紹一些此套件相關的簡單應用。
# install.packages("arulesViz", repos="http://cran.us.r-project.org")
library (arulesViz)
data("cow")
## Warning in data("cow"): 沒有 'cow' 這個資料集
接下來將支持度設置為 0.002,信賴度設置為 0.5,並將所得之關聯規則命名為 rules09。
rules09 = apriori (cow, parameter = list ( support=0.002, confidence=0.5 ) )
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.002 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 2
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[50 item(s), 1143 transaction(s)] done [0.00s].
## sorting and recoding items ... [46 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 done [0.00s].
## writing ... [2311 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules09
## set of 2311 rules
接下來對 rules09 繪製散佈圖:
plot(rules09)
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
圖中的每個點描述了相對應規則的支持度 (橫軸) 與信賴度 (縱軸),而顏色的深淺則由 lift (提升度) 值的高低來決定。可以透過參數設定,變更橫縱軸與顏色所對應的變量,例如:
plot(rules09, measure=c("support", "lift"), shading ="confidence")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
由於觀察圖形,無法確認感興趣的規則對應到的是哪些商品,這個問題可以透過設置互動參數來克服 (可互動部分無法以網頁呈現,可另外透過 R 程式來操作,以滑鼠點選之互動方式選擇特定關聯規則,程式在註解在下方程式區塊 # 後)
# plot(rules09, interactive=TRUE)
此外,我們還可以將 shading 參數設置為 “order” 來繪製出一種特殊的散佈圖- “two-key plot”,而顏色的深淺代表關聯規則中所含有的商品數目的多少,商品的種類 (order) 越多,點的顏色越深。
plot(rules09, shading="order", control=list(main = "Two-key plot"))
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
接下來我們將圖形類型變更為 “grouped” 來產生 “grouped matrix” 圖形,以 lift 參數來看,關聯性最強 (圓點顏色最深) 的兩組產品是 薑絲炒牛肚 及 牛筋 等產品與 牛腱 ,亦可透過 support 參數來看關聯性最強的組合 (圓點尺寸最大)。
plot(rules09, method= "grouped")
至於 method 參數,還可以修改為 “matrix”。“matrix3D”、“paracoord” 等:
以下圖形以 “matrix” 方法呈現,顯示了 rules09 的前 50 項關連規則中, 31 個 LHS (Left Hand Side) 與 13 個 RHS (Right Hand Side) 的支持度參數圖形化呈現。 其中顏色的深淺代表 lift 支持度參數的高低。
plot(rules09[1:50], method="matrix", measure="lift")
## Itemsets in Antecedent (LHS)
## [1] "{高麗菜,溫體牛三層肉鍋}" "{茼蒿,爆炒牛筋}"
## [3] "{高麗菜,爆炒牛筋}" "{牛心,娃娃菜}"
## [5] "{牛心,高麗菜}" "{牛腱}"
## [7] "{冬粉}" "{溫體牛三層肉鍋,爆炒牛筋}"
## [9] "{牛心,溫體牛肉}" "{爆炒牛筋}"
## [11] "{牛心}" "{溫體牛三層肉鍋}"
## [13] "{白蘿蔔}" "{麻油牛心}"
## [15] "{玉米筍}" "{牛筋湯,溫體牛肉湯}"
## [17] "{茼蒿}" "{牛心,青江菜}"
## [19] "{凍豆腐}" "{青江菜}"
## [21] "{玉米}" "{綜合菇}"
## [23] "{油菜炒牛肉}" "{牛雜}"
## [25] "{牛筋湯}" "{牛筋湯,芥蘭炒牛肉}"
## [27] "{加肉}" "{蒜頭炒牛肉}"
## [29] "{洋蔥炒牛肉}" "{三菇炒牛肉}"
## [31] "{炒牛肉麵}"
## Itemsets in Consequent (RHS)
## [1] "{溫體牛肉湯}" "{溫體牛肉}" "{牛肉貢丸(7粒)}" "{芥蘭炒牛肉}"
## [5] "{高麗菜}" "{溫體牛肉鍋}" "{溫體牛肉3+1}" "{綜合菇}"
## [9] "{娃娃菜}" "{溫體牛肉鍋3+1}" "{茼蒿}" "{凍豆腐}"
## [13] "{青江菜}" "{溫體牛三層肉鍋}" "{爆炒牛筋}"
rules09 的前 50 項關連規則詳細內容如下:
inspect(rules09[1:50])
## lhs rhs support confidence
## [1] {牛筋湯} => {芥蘭炒牛肉} 0.004374453 0.7142857
## [2] {牛筋湯} => {溫體牛肉湯} 0.003499563 0.5714286
## [3] {牛心} => {青江菜} 0.004374453 0.7142857
## [4] {牛心} => {娃娃菜} 0.003499563 0.5714286
## [5] {牛心} => {高麗菜} 0.004374453 0.7142857
## [6] {牛心} => {溫體牛肉} 0.004374453 0.7142857
## [7] {溫體牛三層肉鍋} => {茼蒿} 0.006124234 0.6363636
## [8] {溫體牛三層肉鍋} => {溫體牛肉} 0.005249344 0.5454545
## [9] {麻油牛心} => {溫體牛肉3+1} 0.008748906 0.9090909
## [10] {爆炒牛筋} => {溫體牛肉鍋3+1} 0.005249344 0.5000000
## [11] {牛腱} => {溫體牛肉鍋3+1} 0.005249344 0.5454545
## [12] {牛腱} => {茼蒿} 0.008748906 0.9090909
## [13] {玉米筍} => {溫體牛肉鍋} 0.006999125 0.6153846
## [14] {玉米筍} => {高麗菜} 0.007874016 0.6923077
## [15] {洋蔥炒牛肉} => {溫體牛肉湯} 0.011373578 0.5909091
## [16] {油菜炒牛肉} => {高麗菜} 0.006999125 0.5000000
## [17] {油菜炒牛肉} => {牛肉貢丸(7粒)} 0.010498688 0.7500000
## [18] {油菜炒牛肉} => {溫體牛肉} 0.007874016 0.5625000
## [19] {白蘿蔔} => {溫體牛肉鍋3+1} 0.006124234 0.5384615
## [20] {白蘿蔔} => {牛肉貢丸(7粒)} 0.006124234 0.5384615
## [21] {三菇炒牛肉} => {溫體牛肉湯} 0.011373578 0.5200000
## [22] {冬粉} => {凍豆腐} 0.010498688 0.5454545
## [23] {冬粉} => {溫體牛肉鍋3+1} 0.009623797 0.5000000
## [24] {牛雜} => {溫體牛肉} 0.021872266 0.6756757
## [25] {凍豆腐} => {溫體牛肉鍋} 0.018372703 0.5121951
## [26] {青江菜} => {綜合菇} 0.019247594 0.5365854
## [27] {青江菜} => {溫體牛肉} 0.022747157 0.6341463
## [28] {蒜頭炒牛肉} => {溫體牛肉} 0.024496938 0.5185185
## [29] {炒牛肉麵} => {溫體牛肉湯} 0.044619423 0.5000000
## [30] {茼蒿} => {溫體牛肉鍋} 0.033245844 0.5352113
## [31] {玉米} => {高麗菜} 0.030621172 0.5303030
## [32] {綜合菇} => {高麗菜} 0.044619423 0.5049505
## [33] {加肉} => {溫體牛肉} 0.062992126 0.5496183
## [34] {牛筋湯, 芥蘭炒牛肉} => {溫體牛肉湯} 0.003499563 0.8000000
## [35] {牛筋湯, 溫體牛肉湯} => {芥蘭炒牛肉} 0.003499563 1.0000000
## [36] {牛心, 青江菜} => {娃娃菜} 0.002624672 0.6000000
## [37] {牛心, 娃娃菜} => {青江菜} 0.002624672 0.7500000
## [38] {牛心, 青江菜} => {高麗菜} 0.003499563 0.8000000
## [39] {牛心, 高麗菜} => {青江菜} 0.003499563 0.8000000
## [40] {牛心, 青江菜} => {溫體牛肉} 0.003499563 0.8000000
## [41] {牛心, 溫體牛肉} => {青江菜} 0.003499563 0.8000000
## [42] {牛心, 娃娃菜} => {溫體牛肉} 0.003499563 1.0000000
## [43] {牛心, 溫體牛肉} => {娃娃菜} 0.003499563 0.8000000
## [44] {牛心, 高麗菜} => {溫體牛肉} 0.002624672 0.6000000
## [45] {牛心, 溫體牛肉} => {高麗菜} 0.002624672 0.6000000
## [46] {溫體牛三層肉鍋, 爆炒牛筋} => {茼蒿} 0.002624672 1.0000000
## [47] {茼蒿, 爆炒牛筋} => {溫體牛三層肉鍋} 0.002624672 0.7500000
## [48] {溫體牛三層肉鍋, 爆炒牛筋} => {高麗菜} 0.002624672 1.0000000
## [49] {高麗菜, 溫體牛三層肉鍋} => {爆炒牛筋} 0.002624672 1.0000000
## [50] {高麗菜, 爆炒牛筋} => {溫體牛三層肉鍋} 0.002624672 0.7500000
## coverage lift count
## [1] 0.006124234 3.596602 5
## [2] 0.006124234 1.789432 4
## [3] 0.006124234 19.912892 5
## [4] 0.006124234 5.488595 4
## [5] 0.006124234 4.888794 5
## [6] 0.006124234 2.884907 5
## [7] 0.009623797 10.244558 7
## [8] 0.009623797 2.203020 6
## [9] 0.009623797 5.383891 10
## [10] 0.010498688 8.529851 6
## [11] 0.009623797 9.305292 6
## [12] 0.009623797 14.635083 10
## [13] 0.011373578 5.765448 8
## [14] 0.011373578 4.738369 9
## [15] 0.019247594 1.850436 13
## [16] 0.013998250 3.422156 8
## [17] 0.013998250 3.878959 12
## [18] 0.013998250 2.271864 9
## [19] 0.011373578 9.185993 7
## [20] 0.011373578 2.784894 7
## [21] 0.021872266 1.628384 13
## [22] 0.019247594 15.206208 12
## [23] 0.019247594 8.529851 11
## [24] 0.032370954 2.728966 25
## [25] 0.035870516 4.798681 21
## [26] 0.035870516 6.072446 22
## [27] 0.035870516 2.561234 26
## [28] 0.047244094 2.094229 28
## [29] 0.089238845 1.565753 51
## [30] 0.062117235 5.014315 38
## [31] 0.057742782 3.629559 35
## [32] 0.088363955 3.456038 51
## [33] 0.114610674 2.219837 72
## [34] 0.004374453 2.505205 4
## [35] 0.003499563 5.035242 4
## [36] 0.004374453 5.763025 3
## [37] 0.003499563 20.908537 3
## [38] 0.004374453 5.475449 4
## [39] 0.004374453 22.302439 4
## [40] 0.004374453 3.231095 4
## [41] 0.004374453 22.302439 4
## [42] 0.003499563 4.038869 4
## [43] 0.004374453 7.684034 4
## [44] 0.004374453 2.423322 3
## [45] 0.004374453 4.106587 3
## [46] 0.002624672 16.098592 3
## [47] 0.003499563 77.931818 3
## [48] 0.002624672 6.844311 3
## [49] 0.002624672 95.250000 3
## [50] 0.003499563 77.931818 3