購物籃分析又稱關聯分析,從大量的交易資料中探勘資料間具有相關性的隱藏商業規則,其中最經典的就是啤酒與尿布的例子。 以下透過 R 針對台南牛肉湯之店面銷售資料進行簡單範例的操作與說明。
首先安裝與載入 arules 套件
# install.packages("arules", repos="http://cran.us.r-project.org")
library("arules")
## Warning: 套件 'arules' 是用 R 版本 4.2.3 來建造的
## 載入需要的套件:Matrix
## Warning: 套件 'Matrix' 是用 R 版本 4.2.3 來建造的
##
## 載入套件:'arules'
## 下列物件被遮斷自 'package:base':
##
## abbreviate, write
接下來讀取店面銷售資料,使用 summary 指令,獲取其摘要訊息。
cow <- read.transactions("cow.csv")
summary(cow)
## transactions as itemMatrix in sparse format with
## 578 rows (elements/itemsets/transactions) and
## 39 columns (items) and a density of 0.07563659
##
## most frequent items:
## 溫體牛肉湯 溫體牛肉 溫體牛肉3+1 芥蘭炒牛肉 牛肉貢丸(7粒)
## 194 163 142 128 111
## (Other)
## 967
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9
## 98 178 127 80 51 22 16 3 3
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 2.00 3.00 2.95 4.00 9.00
##
## includes extended item information - examples:
## labels
## 1 三菇炒牛肉
## 2 牛三層肉
## 3 牛三層肉湯
展示訂單資料的前八條交易資料:
inspect(cow[1:8])
## items
## [1] {青椒炒牛肉, 綜合湯}
## [2] {牛肉貢丸(7粒), 加肉, 玉米, 娃娃菜, 高麗菜, 溫體牛肉3+1, 滑蛋炒牛肉}
## [3] {芥蘭炒牛肉, 溫體牛肉湯, 滑蛋炒牛肉}
## [4] {溫體牛肉湯}
## [5] {溫體牛肉湯}
## [6] {三菇炒牛肉, 溫體牛肉湯}
## [7] {溫體牛肉湯, 滑蛋炒牛肉}
## [8] {溫體牛肉湯, 滑蛋炒牛肉, 綜合湯}
除了看商品品項之外,我們也可以使用 size 函數,來看單次購買的品項有幾種商品。
size(cow[1:8])
## [1] 2 7 3 1 1 2 2 3
然後我們再利用 itemFrequency 函數計算以列出每一項品項佔的比例,藉此也可以找出出現頻率比較高的商品。
itemFrequency(cow)
## 三菇炒牛肉 牛三層肉 牛三層肉湯 牛心 牛心湯
## 0.027681661 0.031141869 0.055363322 0.008650519 0.005190311
## 牛肉貢丸(7粒) 牛骨隨 牛筋 牛筋湯 牛雜
## 0.192041522 0.019031142 0.036332180 0.008650519 0.034602076
## 牛雜湯 加肉 玉米 油菜炒牛肉 炒牛肉麵
## 0.031141869 0.089965398 0.057093426 0.015570934 0.107266436
## 炒牛雜 炒牛雜麵 芥蘭炒牛肉 金沙苦瓜牛肉 青江菜
## 0.058823529 0.024221453 0.221453287 0.024221453 0.041522491
## 青椒炒牛肉 青蔥炒牛肉 娃娃菜 洋蔥炒牛肉 茼蒿
## 0.060553633 0.051903114 0.117647059 0.024221453 0.001730104
## 高麗菜 高麗菜炒牛肉 麻油牛心 麻油牛腦 溫體牛肉
## 0.166089965 0.089965398 0.015570934 0.001730104 0.282006920
## 溫體牛肉3+1 溫體牛肉湯 滑蛋炒牛肉 綜合湯 綜合菇
## 0.245674740 0.335640138 0.164359862 0.067474048 0.105536332
## 蒜頭炒牛肉 蔥炒牛肉 薑絲炒牛肉 薑絲炒牛肚
## 0.058823529 0.001730104 0.019031142 0.050173010
用 itemFrequencyPlot 指令繪出 Top10 產品。
itemFrequencyPlot(cow,topN = 10)
itemFrequencyPlot(cow,topN = 10,type = "absolute")
itemFrequencyPlot(cow,topN = 10,horiz = T,
main = "Item Frequency",xlab = "Relative Frequency")
顯示比例至少高於 0.1 產品所佔的比例圖, support 參數是支持度的意思 (通常會默認是 0.1,如果不使用的話,將列出所有產品品項,畫面會很亂。)
itemFrequencyPlot(cow,support = 0.1,
main = "Item Frequency with S = 0.1",ylab = "Relative Frequency")
接下來,我們嘗試運用 apriori (先驗演算法) 及 eclat (等價類變換演算法,Equivalence CLAss Transformation, Eclat) 函數 ,看看是否可以從資料中發掘一些有趣的結論:
apriori 演算法大致的運作方式,是首先透過設定 support 以及 confidence 兩個參數,再進一步觀察第三個參數 lift 與第四個參數 coverage:
支持度 (support):「規則」在資料內具有的普遍性,也就是這些 A 跟 B 同時出現的機率多少。 \(Support(A \longrightarrow B) = P(A,B)\)
信賴度 (confidence):「規則」要有一定的信心水準,也就是當購買 A 狀態下,也會購買 B 的條件機率。\(confidence(A \longrightarrow B) = P(B|A)=P(A,B)/P(A)\)
提升度 (lift):「規則」對於特定商品的存在要有一定的提升效果,也就是在產品 B 出現的可能基礎 \(P(B)\) 上,購買 A 狀態下也會購買 B 的條件機率 \(P(B|A)\) ( A 出現的前提下 B 的出現率) 的提升程度。\(lift(A \longrightarrow B) = P(B|A)/P(B) = confidence(A \longrightarrow B) / P(B)\)
覆蓋度 (coverage):「規則」對於特定商品的存在機率是否達到一定的標準,影響了關聯規則的適用性,又稱 LHS-support。也就是 A 產品的購買比例。 \(coverage(A) = P(A)\)
首先,嘗試對 apriori 函數做最少的限制,再依結果來決定該如何調整。支持度的最小閾值暫設定為 0.001 ,信賴度的最小閾值暫設定為 0.05 ,其他參數暫時不設定採預設值,並將所得之關聯規則命名為 rules0:
rules0=apriori(cow,parameter=list(support=0.001,confidence=0.5))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 0
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[39 item(s), 578 transaction(s)] done [0.00s].
## sorting and recoding items ... [39 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 done [0.00s].
## writing ... [13620 rule(s)] done [0.00s].
## creating S4 object ... done [0.01s].
結果顯示 支持度、信賴度 最小值的參數詳解 (parameter specification) 以及計錄演算法執行過程中相關參數的演算法控制 (algorithmic control) 以及若干執行細節。
rules0 # 顯示 rules0 產生之關聯規則數目
## set of 13620 rules
可看出 rules0 中共包含 13620 條關聯規則,完整顯示這 13620 條關聯規則並沒有太大的意義。而且,透過觀察前 10 條規則,我們發現關聯規則的先後順序與其關聯強度的四個參數值 (support、confidence、lift、coverage) 的取值大小沒有明顯的關聯。
inspect(rules0[1:10]) # 顯示rules0其中前 10 條規則
## lhs rhs support confidence coverage lift
## [1] {蔥炒牛肉} => {溫體牛肉湯} 0.001730104 1.0000000 0.001730104 2.979381
## [2] {麻油牛腦} => {牛骨隨} 0.001730104 1.0000000 0.001730104 52.545455
## [3] {麻油牛腦} => {炒牛肉麵} 0.001730104 1.0000000 0.001730104 9.322581
## [4] {茼蒿} => {加肉} 0.001730104 1.0000000 0.001730104 11.115385
## [5] {茼蒿} => {滑蛋炒牛肉} 0.001730104 1.0000000 0.001730104 6.084211
## [6] {茼蒿} => {溫體牛肉} 0.001730104 1.0000000 0.001730104 3.546012
## [7] {牛心湯} => {溫體牛肉湯} 0.003460208 0.6666667 0.005190311 1.986254
## [8] {牛筋湯} => {芥蘭炒牛肉} 0.005190311 0.6000000 0.008650519 2.709375
## [9] {牛心} => {青江菜} 0.005190311 0.6000000 0.008650519 14.450000
## [10] {牛心} => {娃娃菜} 0.005190311 0.6000000 0.008650519 5.100000
## count
## [1] 1
## [2] 1
## [3] 1
## [4] 1
## [5] 1
## [6] 1
## [7] 2
## [8] 3
## [9] 3
## [10] 3
由於訂單資料尚在累積中,交易資料有限,因此可看出前 10 條規則有超過一半都僅出現於一筆交易,顯然不是理想的關聯規則,coverage偏低。
面對複雜混亂的大量訊息,較好的方法是針對生成規則進行強度控制,以生成關聯性較強的若干重要規則,以下透過幾種嘗試進行參數調整:
rules01=apriori(cow,parameter=list(support=0.005,confidence=0.5))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 2
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[39 item(s), 578 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [475 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules01 # 顯示 rules01 產生之關聯規則數目
## set of 475 rules
inspect(rules01[1:10]) # 顯示其中前 10 條規則
## lhs rhs support confidence coverage
## [1] {牛筋湯} => {芥蘭炒牛肉} 0.005190311 0.6000000 0.008650519
## [2] {牛心} => {青江菜} 0.005190311 0.6000000 0.008650519
## [3] {牛心} => {娃娃菜} 0.005190311 0.6000000 0.008650519
## [4] {牛心} => {高麗菜} 0.005190311 0.6000000 0.008650519
## [5] {牛心} => {溫體牛肉} 0.005190311 0.6000000 0.008650519
## [6] {油菜炒牛肉} => {高麗菜} 0.008650519 0.5555556 0.015570934
## [7] {油菜炒牛肉} => {牛肉貢丸(7粒)} 0.010380623 0.6666667 0.015570934
## [8] {油菜炒牛肉} => {溫體牛肉} 0.010380623 0.6666667 0.015570934
## [9] {洋蔥炒牛肉} => {溫體牛肉湯} 0.012110727 0.5000000 0.024221453
## [10] {麻油牛心} => {溫體牛肉3+1} 0.013840830 0.8888889 0.015570934
## lift count
## [1] 2.709375 3
## [2] 14.450000 3
## [3] 5.100000 3
## [4] 3.612500 3
## [5] 2.127607 3
## [6] 3.344907 5
## [7] 3.471471 6
## [8] 2.364008 6
## [9] 1.489691 7
## [10] 3.618153 8
rules02=apriori(cow,parameter=list(support=0.005,confidence=0.7))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.7 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 2
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[39 item(s), 578 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [174 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules02 # 顯示 rules02 產生之關聯規則數目
## set of 174 rules
inspect(rules02[1:10]) # 顯示其中前 10 條規則
## lhs rhs support confidence
## [1] {麻油牛心} => {溫體牛肉3+1} 0.013840830 0.8888889
## [2] {牛雜} => {溫體牛肉} 0.029411765 0.8500000
## [3] {牛筋} => {溫體牛肉} 0.025951557 0.7142857
## [4] {加肉} => {溫體牛肉} 0.074394464 0.8269231
## [5] {牛心, 娃娃菜} => {溫體牛肉} 0.005190311 1.0000000
## [6] {牛心, 溫體牛肉} => {娃娃菜} 0.005190311 1.0000000
## [7] {牛肉貢丸(7粒), 炒牛雜麵} => {溫體牛肉} 0.005190311 1.0000000
## [8] {炒牛雜麵, 溫體牛肉} => {牛肉貢丸(7粒)} 0.005190311 0.7500000
## [9] {牛筋, 油菜炒牛肉} => {高麗菜} 0.005190311 1.0000000
## [10] {牛筋, 油菜炒牛肉} => {牛肉貢丸(7粒)} 0.005190311 1.0000000
## coverage lift count
## [1] 0.015570934 3.618153 8
## [2] 0.034602076 3.014110 17
## [3] 0.036332180 2.532866 15
## [4] 0.089965398 2.932279 43
## [5] 0.005190311 3.546012 3
## [6] 0.005190311 8.500000 3
## [7] 0.005190311 3.546012 3
## [8] 0.006920415 3.905405 3
## [9] 0.005190311 6.020833 3
## [10] 0.005190311 5.207207 3
rules03=apriori(cow,parameter=list(support=0.01,confidence=0.8))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 5
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[39 item(s), 578 transaction(s)] done [0.00s].
## sorting and recoding items ... [33 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [14 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules03 # 顯示 rules03 產生之關聯規則數目
## set of 14 rules
inspect(rules03) # 觀察其內容
## lhs rhs support
## [1] {麻油牛心} => {溫體牛肉3+1} 0.01384083
## [2] {牛雜} => {溫體牛肉} 0.02941176
## [3] {加肉} => {溫體牛肉} 0.07439446
## [4] {牛筋, 芥蘭炒牛肉} => {溫體牛肉} 0.01038062
## [5] {牛肉貢丸(7粒), 青江菜} => {綜合菇} 0.01557093
## [6] {加肉, 蒜頭炒牛肉} => {溫體牛肉} 0.01211073
## [7] {高麗菜, 蒜頭炒牛肉} => {牛肉貢丸(7粒)} 0.01384083
## [8] {加肉, 玉米} => {高麗菜} 0.01038062
## [9] {炒牛肉麵, 滑蛋炒牛肉} => {溫體牛肉湯} 0.01038062
## [10] {牛肉貢丸(7粒), 加肉} => {溫體牛肉} 0.03460208
## [11] {青江菜, 溫體牛肉, 綜合菇} => {牛肉貢丸(7粒)} 0.01038062
## [12] {牛肉貢丸(7粒), 青江菜, 溫體牛肉} => {綜合菇} 0.01038062
## [13] {牛肉貢丸(7粒), 玉米, 高麗菜} => {溫體牛肉3+1} 0.01384083
## [14] {牛肉貢丸(7粒), 玉米, 溫體牛肉3+1} => {高麗菜} 0.01384083
## confidence coverage lift count
## [1] 0.8888889 0.01557093 3.618153 8
## [2] 0.8500000 0.03460208 3.014110 17
## [3] 0.8269231 0.08996540 2.932279 43
## [4] 1.0000000 0.01038062 3.546012 6
## [5] 0.9000000 0.01730104 8.527869 9
## [6] 1.0000000 0.01211073 3.546012 7
## [7] 0.8888889 0.01557093 4.628629 8
## [8] 0.8571429 0.01211073 5.160714 6
## [9] 0.8571429 0.01211073 2.553756 6
## [10] 0.8000000 0.04325260 2.836810 20
## [11] 0.8571429 0.01211073 4.463320 6
## [12] 1.0000000 0.01038062 9.475410 6
## [13] 0.8000000 0.01730104 3.256338 8
## [14] 0.8000000 0.01730104 4.816667 8
rules04=apriori(cow,parameter=list(support=0.015,confidence=0.8))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.015 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 8
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[39 item(s), 578 transaction(s)] done [0.00s].
## sorting and recoding items ... [33 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [4 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules04 # 顯示 rules04 產生之關聯規則數目
## set of 4 rules
inspect(rules04) # 觀察其內容
## lhs rhs support confidence coverage
## [1] {牛雜} => {溫體牛肉} 0.02941176 0.8500000 0.03460208
## [2] {加肉} => {溫體牛肉} 0.07439446 0.8269231 0.08996540
## [3] {牛肉貢丸(7粒), 青江菜} => {綜合菇} 0.01557093 0.9000000 0.01730104
## [4] {牛肉貢丸(7粒), 加肉} => {溫體牛肉} 0.03460208 0.8000000 0.04325260
## lift count
## [1] 3.014110 17
## [2] 2.932279 43
## [3] 8.527869 9
## [4] 2.836810 20
結果得到四條關聯規則,coverage 也都達到0.017以上。
此外,我們也可以透過將已形成的關聯規則 (例如 rules0) 之其中一個參數採取固定閾值,再依照其他參數來選擇前幾強的關聯規則,例如對於關聯規則 rules0 ,設定信賴度閾值為 0.5,並分別按 支持度、信賴度、提升度 排序,將結果計為 rules.sorted_supp、rules.sorted_conf、rules.sorted_lift,各選擇出前 6 強的關聯規則:
rules.sorted_supp = sort ( rules0, by="support" )
inspect ( rules.sorted_supp [1:6] )
## lhs rhs support confidence coverage
## [1] {牛肉貢丸(7粒)} => {溫體牛肉} 0.09688581 0.5045045 0.19204152
## [2] {高麗菜} => {牛肉貢丸(7粒)} 0.08477509 0.5104167 0.16608997
## [3] {加肉} => {溫體牛肉} 0.07439446 0.8269231 0.08996540
## [4] {娃娃菜} => {溫體牛肉} 0.06228374 0.5294118 0.11764706
## [5] {綜合菇} => {溫體牛肉} 0.05363322 0.5081967 0.10553633
## [6] {牛肉貢丸(7粒), 高麗菜} => {溫體牛肉3+1} 0.04325260 0.5102041 0.08477509
## lift count
## [1] 1.788979 56
## [2] 2.657845 49
## [3] 2.932279 43
## [4] 1.877301 36
## [5] 1.802072 31
## [6] 2.076746 25
rules.sorted_conf = sort ( rules0, by="confidence" )
inspect ( rules.sorted_conf [1:6] )
## lhs rhs support confidence coverage lift
## [1] {蔥炒牛肉} => {溫體牛肉湯} 0.001730104 1 0.001730104 2.979381
## [2] {麻油牛腦} => {牛骨隨} 0.001730104 1 0.001730104 52.545455
## [3] {麻油牛腦} => {炒牛肉麵} 0.001730104 1 0.001730104 9.322581
## [4] {茼蒿} => {加肉} 0.001730104 1 0.001730104 11.115385
## [5] {茼蒿} => {滑蛋炒牛肉} 0.001730104 1 0.001730104 6.084211
## [6] {茼蒿} => {溫體牛肉} 0.001730104 1 0.001730104 3.546012
## count
## [1] 1
## [2] 1
## [3] 1
## [4] 1
## [5] 1
## [6] 1
rules.sorted_lift = sort ( rules0, by="lift" )
inspect ( rules.sorted_lift [1:6] )
## lhs rhs support confidence coverage
## [1] {牛骨隨, 炒牛肉麵} => {麻油牛腦} 0.001730104 1 0.001730104
## [2] {牛骨隨, 麻油牛心} => {牛筋湯} 0.001730104 1 0.001730104
## [3] {牛骨隨, 炒牛雜} => {牛筋湯} 0.001730104 1 0.001730104
## [4] {牛骨隨, 芥蘭炒牛肉} => {牛筋湯} 0.001730104 1 0.001730104
## [5] {青椒炒牛肉, 麻油牛心} => {牛筋湯} 0.001730104 1 0.001730104
## [6] {炒牛雜, 麻油牛心} => {牛筋湯} 0.001730104 1 0.001730104
## lift count
## [1] 578.0 1
## [2] 115.6 1
## [3] 115.6 1
## [4] 115.6 1
## [5] 115.6 1
## [6] 115.6 1
看來在設定信賴度閾值為 0.5的條件下,只有按支持度選擇出的前 6 強關聯規則較具實際意義,因為 counts (消費訂單數) 皆達20以上。
接下來透過例子談談關聯規則的實際應用。我們常在賣場裡發現過兩種商品綑綁在一起銷售的情況,這也許是因為商家想要促銷其中的某種商品。假設店家現在想要促銷一款較冷門的商品:高麗菜炒牛肉
rules05=apriori(cow,parameter=list(maxlen=2,supp=0.005,conf=0.05),appearance=list(rhs="高麗菜炒牛肉",default="lhs"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.05 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 2 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 2
##
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[39 item(s), 578 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2
## Warning in apriori(cow, parameter = list(maxlen = 2, supp = 0.005, conf =
## 0.05), : Mining stopped (maxlen reached). Only patterns up to a length of 2
## returned!
## done [0.00s].
## writing ... [8 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
提高信賴度的要求至0.1以上,將關聯規則限縮到只剩一項。
inspect(rules05)
## lhs rhs support confidence coverage lift
## [1] {} => {高麗菜炒牛肉} 0.089965398 0.08996540 1.00000000 1.0000000
## [2] {牛三層肉湯} => {高麗菜炒牛肉} 0.005190311 0.09375000 0.05536332 1.0420673
## [3] {牛三層肉} => {高麗菜炒牛肉} 0.005190311 0.16666667 0.03114187 1.8525641
## [4] {薑絲炒牛肚} => {高麗菜炒牛肉} 0.008650519 0.17241379 0.05017301 1.9164456
## [5] {滑蛋炒牛肉} => {高麗菜炒牛肉} 0.012110727 0.07368421 0.16435986 0.8190283
## [6] {高麗菜} => {高麗菜炒牛肉} 0.008650519 0.05208333 0.16608997 0.5789263
## [7] {溫體牛肉湯} => {高麗菜炒牛肉} 0.043252595 0.12886598 0.33564014 1.4323949
## [8] {溫體牛肉3+1} => {高麗菜炒牛肉} 0.022491349 0.09154930 0.24567474 1.0176056
## count
## [1] 52
## [2] 3
## [3] 3
## [4] 5
## [5] 7
## [6] 5
## [7] 25
## [8] 13
rules06=apriori(cow,parameter=list(maxlen=2,supp=0.02,conf=0.1),appearance=list(rhs="高麗菜炒牛肉",default="lhs"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.1 0.1 1 none FALSE TRUE 5 0.02 1
## maxlen target ext
## 2 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 11
##
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[39 item(s), 578 transaction(s)] done [0.00s].
## sorting and recoding items ... [29 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2
## Warning in apriori(cow, parameter = list(maxlen = 2, supp = 0.02, conf = 0.1),
## : Mining stopped (maxlen reached). Only patterns up to a length of 2 returned!
## done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
inspect(rules06)
## lhs rhs support confidence coverage lift
## [1] {溫體牛肉湯} => {高麗菜炒牛肉} 0.0432526 0.128866 0.3356401 1.432395
## count
## [1] 25
結果顯示 “溫體牛肉湯” 是 “高麗菜炒牛肉” 的 (最) 強關聯規則商品,因此可以考慮將這兩種產品綑綁在一起銷售。
假設店家現在想要促銷另一款較冷門的商品:青蔥炒牛肉
rules07=apriori(cow,parameter=list(maxlen=2,supp=0.005,conf=0.05),appearance=list(rhs="青蔥炒牛肉",default="lhs"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.05 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 2 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 2
##
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[39 item(s), 578 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2
## Warning in apriori(cow, parameter = list(maxlen = 2, supp = 0.005, conf =
## 0.05), : Mining stopped (maxlen reached). Only patterns up to a length of 2
## returned!
## done [0.00s].
## writing ... [8 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
inspect(rules07)
## lhs rhs support confidence coverage lift
## [1] {} => {青蔥炒牛肉} 0.051903114 0.05190311 1.00000000 1.000000
## [2] {薑絲炒牛肚} => {青蔥炒牛肉} 0.006920415 0.13793103 0.05017301 2.657471
## [3] {炒牛雜} => {青蔥炒牛肉} 0.005190311 0.08823529 0.05882353 1.700000
## [4] {炒牛肉麵} => {青蔥炒牛肉} 0.008650519 0.08064516 0.10726644 1.553763
## [5] {加肉} => {青蔥炒牛肉} 0.008650519 0.09615385 0.08996540 1.852564
## [6] {娃娃菜} => {青蔥炒牛肉} 0.006920415 0.05882353 0.11764706 1.133333
## [7] {高麗菜} => {青蔥炒牛肉} 0.008650519 0.05208333 0.16608997 1.003472
## [8] {溫體牛肉湯} => {青蔥炒牛肉} 0.020761246 0.06185567 0.33564014 1.191753
## count
## [1] 30
## [2] 4
## [3] 3
## [4] 5
## [5] 5
## [6] 4
## [7] 5
## [8] 12
提高信賴度的要求至0.1以上,將關聯規則限縮到只剩一項。
rules08=apriori(cow,parameter=list(maxlen=2,supp=0.005,conf=0.10),appearance=list(rhs="青蔥炒牛肉",default="lhs"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.1 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 2 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 2
##
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[39 item(s), 578 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2
## Warning in apriori(cow, parameter = list(maxlen = 2, supp = 0.005, conf = 0.1),
## : Mining stopped (maxlen reached). Only patterns up to a length of 2 returned!
## done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
inspect(rules08)
## lhs rhs support confidence coverage lift
## [1] {薑絲炒牛肚} => {青蔥炒牛肉} 0.006920415 0.137931 0.05017301 2.657471
## count
## [1] 4
結果顯示 “薑絲炒牛肚” 是 “青蔥炒牛肉” 的 (最) 強關聯規則商品,因此可以考慮將這兩種產品綑綁在一起銷售。
透過圖形的方式,可藉由視覺化方式,更直觀地顯示出關聯分析的結果。 這需要用到 R 的擴充套件 arulesViz。
此處介紹一些此套件相關的簡單應用。
# install.packages("arulesViz", repos="http://cran.us.r-project.org")
library (arulesViz)
## Warning: 套件 'arulesViz' 是用 R 版本 4.2.3 來建造的
data("cow")
## Warning in data("cow"): 沒有 'cow' 這個資料集
接下來將支持度設置為 0.002,信賴度設置為 0.5,並將所得之關聯規則命名為 rules09。
rules09 = apriori (cow, parameter = list ( support=0.002, confidence=0.5 ) )
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.002 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 1
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[39 item(s), 578 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.00s].
## writing ... [1270 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules09
## set of 1270 rules
接下來對 rules09 繪製散佈圖:
plot(rules09)
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
圖中的每個點描述了相對應規則的支持度 (橫軸) 與信賴度 (縱軸),而顏色的深淺則由 lift (提升度) 值的高低來決定。可以透過參數設定,變更橫縱軸與顏色所對應的變量,例如:
plot(rules09, measure=c("support", "lift"), shading ="confidence")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
由於觀察圖形,無法確認感興趣的規則對應到的是哪些商品,這個問題可以透過設置互動參數來克服 (可互動部分無法以網頁呈現,可另外透過 R 程式來操作,以滑鼠點選之互動方式選擇特定關聯規則,程式在註解在下方程式區塊 # 後)
# plot(rules09, interactive=TRUE)
此外,我們還可以將 shading 參數設置為 “order” 來繪製出一種特殊的散佈圖- “two-key plot”,而顏色的深淺代表關聯規則中所含有的商品數目的多少,商品的種類 (order) 越多,點的顏色越深。
plot(rules09, shading="order", control=list(main = "Two-key plot"))
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
接下來我們將圖形類型變更為 “grouped” 來產生 “grouped matrix” 圖形,以 lift 參數來看,關聯性最強 (圓點顏色最深) 的兩組產品是 薑絲炒牛肚 及 炒牛雜 等產品與 油菜炒牛肉 ,亦可透過 support 參數來看關聯性最強的組合 (圓點尺寸最大)。
plot(rules09, method= "grouped")
至於 method 參數,還可以修改為 “matrix”。“matrix3D”、“paracoord” 等:
以下圖形以 “matrix” 方法呈現,顯示了 rules09 的前 50 項關連規則中, 31 個 LHS (Left Hand Side) 與 12 個 RHS (Right Hand Side) 的支持度參數圖形化呈現。 其中顏色的深淺代表 lift 支持度參數的高低。
plot(rules09[1:50], method="matrix", measure="lift")
## Itemsets in Antecedent (LHS)
## [1] "{油菜炒牛肉,高麗菜}" "{牛肉貢丸(7粒),油菜炒牛肉}"
## [3] "{牛心,溫體牛肉}" "{牛心,娃娃菜}"
## [5] "{油菜炒牛肉,炒牛雜}" "{牛心,高麗菜}"
## [7] "{油菜炒牛肉,薑絲炒牛肚}" "{牛心}"
## [9] "{牛心,溫體牛肉3+1}" "{牛筋,油菜炒牛肉}"
## [11] "{牛筋湯,溫體牛肉湯}" "{牛心,青江菜}"
## [13] "{炒牛雜麵,溫體牛肉}" "{麻油牛心}"
## [15] "{青江菜}" "{牛肉貢丸(7粒),炒牛雜麵}"
## [17] "{油菜炒牛肉}" "{牛雜}"
## [19] "{加肉}" "{玉米}"
## [21] "{牛筋湯}" "{高麗菜}"
## [23] "{牛筋}" "{牛心湯}"
## [25] "{牛筋湯,芥蘭炒牛肉}" "{牛三層肉}"
## [27] "{娃娃菜}" "{綜合菇}"
## [29] "{牛肉貢丸(7粒)}" "{洋蔥炒牛肉}"
## [31] "{三菇炒牛肉}"
## Itemsets in Consequent (RHS)
## [1] "{溫體牛肉湯}" "{溫體牛肉}" "{溫體牛肉3+1}" "{芥蘭炒牛肉}"
## [5] "{牛肉貢丸(7粒)}" "{高麗菜}" "{綜合菇}" "{娃娃菜}"
## [9] "{牛筋}" "{青江菜}" "{炒牛雜}" "{薑絲炒牛肚}"
rules09 的前 50 項關連規則詳細內容如下:
inspect(rules09[1:50])
## lhs rhs support confidence
## [1] {牛心湯} => {溫體牛肉湯} 0.003460208 0.6666667
## [2] {牛筋湯} => {芥蘭炒牛肉} 0.005190311 0.6000000
## [3] {牛心} => {青江菜} 0.005190311 0.6000000
## [4] {牛心} => {娃娃菜} 0.005190311 0.6000000
## [5] {牛心} => {高麗菜} 0.005190311 0.6000000
## [6] {牛心} => {溫體牛肉} 0.005190311 0.6000000
## [7] {油菜炒牛肉} => {高麗菜} 0.008650519 0.5555556
## [8] {油菜炒牛肉} => {牛肉貢丸(7粒)} 0.010380623 0.6666667
## [9] {油菜炒牛肉} => {溫體牛肉} 0.010380623 0.6666667
## [10] {洋蔥炒牛肉} => {溫體牛肉湯} 0.012110727 0.5000000
## [11] {麻油牛心} => {溫體牛肉3+1} 0.013840830 0.8888889
## [12] {三菇炒牛肉} => {溫體牛肉湯} 0.013840830 0.5000000
## [13] {牛雜} => {溫體牛肉} 0.029411765 0.8500000
## [14] {牛三層肉} => {溫體牛肉} 0.017301038 0.5555556
## [15] {牛筋} => {溫體牛肉} 0.025951557 0.7142857
## [16] {青江菜} => {綜合菇} 0.022491349 0.5416667
## [17] {青江菜} => {溫體牛肉} 0.024221453 0.5833333
## [18] {玉米} => {高麗菜} 0.031141869 0.5454545
## [19] {玉米} => {牛肉貢丸(7粒)} 0.029411765 0.5151515
## [20] {玉米} => {溫體牛肉3+1} 0.032871972 0.5757576
## [21] {加肉} => {溫體牛肉} 0.074394464 0.8269231
## [22] {綜合菇} => {溫體牛肉} 0.053633218 0.5081967
## [23] {娃娃菜} => {溫體牛肉} 0.062283737 0.5294118
## [24] {高麗菜} => {牛肉貢丸(7粒)} 0.084775087 0.5104167
## [25] {牛肉貢丸(7粒)} => {溫體牛肉} 0.096885813 0.5045045
## [26] {牛筋湯, 芥蘭炒牛肉} => {溫體牛肉湯} 0.003460208 0.6666667
## [27] {牛筋湯, 溫體牛肉湯} => {芥蘭炒牛肉} 0.003460208 1.0000000
## [28] {牛心, 青江菜} => {娃娃菜} 0.003460208 0.6666667
## [29] {牛心, 娃娃菜} => {青江菜} 0.003460208 0.6666667
## [30] {牛心, 青江菜} => {高麗菜} 0.003460208 0.6666667
## [31] {牛心, 高麗菜} => {青江菜} 0.003460208 0.6666667
## [32] {牛心, 青江菜} => {溫體牛肉} 0.003460208 0.6666667
## [33] {牛心, 溫體牛肉} => {青江菜} 0.003460208 0.6666667
## [34] {牛心, 娃娃菜} => {溫體牛肉} 0.005190311 1.0000000
## [35] {牛心, 溫體牛肉} => {娃娃菜} 0.005190311 1.0000000
## [36] {牛心, 高麗菜} => {溫體牛肉3+1} 0.003460208 0.6666667
## [37] {牛心, 溫體牛肉3+1} => {高麗菜} 0.003460208 1.0000000
## [38] {牛肉貢丸(7粒), 炒牛雜麵} => {溫體牛肉} 0.005190311 1.0000000
## [39] {炒牛雜麵, 溫體牛肉} => {牛肉貢丸(7粒)} 0.005190311 0.7500000
## [40] {牛筋, 油菜炒牛肉} => {高麗菜} 0.005190311 1.0000000
## [41] {油菜炒牛肉, 高麗菜} => {牛筋} 0.005190311 0.6000000
## [42] {牛筋, 油菜炒牛肉} => {牛肉貢丸(7粒)} 0.005190311 1.0000000
## [43] {牛肉貢丸(7粒), 油菜炒牛肉} => {牛筋} 0.005190311 0.5000000
## [44] {牛筋, 油菜炒牛肉} => {溫體牛肉} 0.003460208 0.6666667
## [45] {油菜炒牛肉, 薑絲炒牛肚} => {炒牛雜} 0.003460208 1.0000000
## [46] {油菜炒牛肉, 炒牛雜} => {薑絲炒牛肚} 0.003460208 1.0000000
## [47] {油菜炒牛肉, 薑絲炒牛肚} => {牛肉貢丸(7粒)} 0.003460208 1.0000000
## [48] {油菜炒牛肉, 薑絲炒牛肚} => {溫體牛肉} 0.003460208 1.0000000
## [49] {油菜炒牛肉, 炒牛雜} => {牛肉貢丸(7粒)} 0.003460208 1.0000000
## [50] {油菜炒牛肉, 炒牛雜} => {溫體牛肉} 0.003460208 1.0000000
## coverage lift count
## [1] 0.005190311 1.986254 2
## [2] 0.008650519 2.709375 3
## [3] 0.008650519 14.450000 3
## [4] 0.008650519 5.100000 3
## [5] 0.008650519 3.612500 3
## [6] 0.008650519 2.127607 3
## [7] 0.015570934 3.344907 5
## [8] 0.015570934 3.471471 6
## [9] 0.015570934 2.364008 6
## [10] 0.024221453 1.489691 7
## [11] 0.015570934 3.618153 8
## [12] 0.027681661 1.489691 8
## [13] 0.034602076 3.014110 17
## [14] 0.031141869 1.970007 10
## [15] 0.036332180 2.532866 15
## [16] 0.041522491 5.132514 13
## [17] 0.041522491 2.068507 14
## [18] 0.057093426 3.284091 18
## [19] 0.057093426 2.682501 17
## [20] 0.057093426 2.343577 19
## [21] 0.089965398 2.932279 43
## [22] 0.105536332 1.802072 31
## [23] 0.117647059 1.877301 36
## [24] 0.166089965 2.657845 49
## [25] 0.192041522 1.788979 56
## [26] 0.005190311 1.986254 2
## [27] 0.003460208 4.515625 2
## [28] 0.005190311 5.666667 2
## [29] 0.005190311 16.055556 2
## [30] 0.005190311 4.013889 2
## [31] 0.005190311 16.055556 2
## [32] 0.005190311 2.364008 2
## [33] 0.005190311 16.055556 2
## [34] 0.005190311 3.546012 3
## [35] 0.005190311 8.500000 3
## [36] 0.005190311 2.713615 2
## [37] 0.003460208 6.020833 2
## [38] 0.005190311 3.546012 3
## [39] 0.006920415 3.905405 3
## [40] 0.005190311 6.020833 3
## [41] 0.008650519 16.514286 3
## [42] 0.005190311 5.207207 3
## [43] 0.010380623 13.761905 3
## [44] 0.005190311 2.364008 2
## [45] 0.003460208 17.000000 2
## [46] 0.003460208 19.931034 2
## [47] 0.003460208 5.207207 2
## [48] 0.003460208 3.546012 2
## [49] 0.003460208 5.207207 2
## [50] 0.003460208 3.546012 2