購物籃分析又稱關聯分析,從大量的交易資料中探勘資料間具有相關性的隱藏商業規則,其中最經典的就是啤酒與尿布的例子。 以下透過 R 進行簡單範例的操作與說明。
首先安裝與載入 arules 套件
# install.packages("arules", repos="http://cran.us.r-project.org")
library ( arules )
## Warning: 套件 'arules' 是用 R 版本 4.2.3 來建造的
## 載入需要的套件:Matrix
## Warning: 套件 'Matrix' 是用 R 版本 4.2.3 來建造的
##
## 載入套件:'arules'
## 下列物件被遮斷自 'package:base':
##
## abbreviate, write
接下來選擇使用 arules 套件內建的 Groceries (食品雜貨) 資料集,獲取其摘要訊息。
data("Groceries")
summary(Groceries)
## transactions as itemMatrix in sparse format with
## 9835 rows (elements/itemsets/transactions) and
## 169 columns (items) and a density of 0.02609146
##
## most frequent items:
## whole milk other vegetables rolls/buns soda
## 2513 1903 1809 1715
## yogurt (Other)
## 1372 34055
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 2159 1643 1299 1005 855 645 545 438 350 246 182 117 78 77 55 46
## 17 18 19 20 21 22 23 24 26 27 28 29 32
## 29 14 14 9 11 4 6 1 1 1 1 3 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 4.409 6.000 32.000
##
## includes extended item information - examples:
## labels level2 level1
## 1 frankfurter sausage meat and sausage
## 2 sausage sausage meat and sausage
## 3 liver loaf sausage meat and sausage
展示 Groceries 資料集的前八條交易資料:
inspect(Groceries[1:8])
## items
## [1] {citrus fruit,
## semi-finished bread,
## margarine,
## ready soups}
## [2] {tropical fruit,
## yogurt,
## coffee}
## [3] {whole milk}
## [4] {pip fruit,
## yogurt,
## cream cheese ,
## meat spreads}
## [5] {other vegetables,
## whole milk,
## condensed milk,
## long life bakery product}
## [6] {whole milk,
## butter,
## yogurt,
## rice,
## abrasive cleaner}
## [7] {rolls/buns}
## [8] {other vegetables,
## UHT-milk,
## rolls/buns,
## bottled beer,
## liquor (appetizer)}
一條交易資料及代表一位消費者購物籃內的商品類別,例如第一位消費者購買了 柑橘類水果 (citrus fruits) 、半成品麵包 (semi-finished bread)、人造奶油 (margarine)、 現成即食湯 (ready soups) 四種品項。
除了看商品品項之外,我們也可以使用 size 函數,來看單次購買的品項有幾種商品。
size(Groceries[1:8])
## [1] 4 3 1 4 4 5 1 5
然後我們再利用 itemFrequency 函數計算以列出每一項品項佔的比例,藉此也可以找出出現頻率比較高的商品。
itemFrequency(Groceries)
## frankfurter sausage liver loaf
## 0.0589730554 0.0939501779 0.0050838841
## ham meat finished products
## 0.0260294865 0.0258261312 0.0065073716
## organic sausage chicken turkey
## 0.0022369090 0.0429079817 0.0081342145
## pork beef hamburger meat
## 0.0576512456 0.0524656838 0.0332486019
## fish citrus fruit tropical fruit
## 0.0029486528 0.0827656329 0.1049313676
## pip fruit grapes berries
## 0.0756481952 0.0223690900 0.0332486019
## nuts/prunes root vegetables onions
## 0.0033553635 0.1089984748 0.0310116929
## herbs other vegetables packaged fruit/vegetables
## 0.0162684291 0.1934926284 0.0130147433
## whole milk butter curd
## 0.2555160142 0.0554143366 0.0532791052
## dessert butter milk yogurt
## 0.0371123538 0.0279613625 0.1395017794
## whipped/sour cream beverages UHT-milk
## 0.0716827656 0.0260294865 0.0334519573
## condensed milk cream soft cheese
## 0.0102694459 0.0013218099 0.0170818505
## sliced cheese hard cheese cream cheese
## 0.0245043213 0.0245043213 0.0396542959
## processed cheese spread cheese curd cheese
## 0.0165734621 0.0111845450 0.0050838841
## specialty cheese mayonnaise salad dressing
## 0.0085409253 0.0091509914 0.0008134215
## tidbits frozen vegetables frozen fruits
## 0.0023385867 0.0480935435 0.0012201322
## frozen meals frozen fish frozen chicken
## 0.0283680732 0.0116929334 0.0006100661
## ice cream frozen dessert frozen potato products
## 0.0250127097 0.0107778343 0.0084392476
## domestic eggs rolls/buns white bread
## 0.0634468734 0.1839349263 0.0420945602
## brown bread pastry roll products
## 0.0648703610 0.0889679715 0.0102694459
## semi-finished bread zwieback potato products
## 0.0176919166 0.0069140824 0.0028469751
## flour salt rice
## 0.0173868836 0.0107778343 0.0076258261
## pasta vinegar oil
## 0.0150482969 0.0065073716 0.0280630402
## margarine specialty fat sugar
## 0.0585663447 0.0036603965 0.0338586680
## artif. sweetener honey mustard
## 0.0032536858 0.0015251652 0.0119979664
## ketchup spices soups
## 0.0042704626 0.0051855618 0.0068124047
## ready soups Instant food products sauces
## 0.0018301983 0.0080325369 0.0054905948
## cereals organic products baking powder
## 0.0056939502 0.0016268429 0.0176919166
## preservation products pudding powder canned vegetables
## 0.0002033554 0.0023385867 0.0107778343
## canned fruit pickled vegetables specialty vegetables
## 0.0032536858 0.0178952720 0.0017285206
## jam sweet spreads meat spreads
## 0.0053889171 0.0090493137 0.0042704626
## canned fish dog food cat food
## 0.0150482969 0.0085409253 0.0232841891
## pet care baby food coffee
## 0.0094560244 0.0001016777 0.0580579563
## instant coffee tea cocoa drinks
## 0.0074224708 0.0038637519 0.0022369090
## bottled water soda misc. beverages
## 0.1105236401 0.1743772242 0.0283680732
## fruit/vegetable juice syrup bottled beer
## 0.0722928317 0.0032536858 0.0805287239
## canned beer brandy whisky
## 0.0776817489 0.0041687850 0.0008134215
## liquor rum liqueur
## 0.0110828673 0.0044738180 0.0009150991
## liquor (appetizer) white wine red/blush wine
## 0.0079308592 0.0190137265 0.0192170819
## prosecco sparkling wine salty snack
## 0.0020335536 0.0055922725 0.0378240976
## popcorn nut snack snack products
## 0.0072191154 0.0031520081 0.0030503305
## long life bakery product waffles cake bar
## 0.0374173869 0.0384341637 0.0132180986
## chewing gum chocolate cooking chocolate
## 0.0210472801 0.0496187087 0.0025419420
## specialty chocolate specialty bar chocolate marshmallow
## 0.0304016268 0.0273512964 0.0090493137
## candy seasonal products detergent
## 0.0298932384 0.0142348754 0.0192170819
## softener decalcifier dish cleaner
## 0.0054905948 0.0015251652 0.0104728012
## abrasive cleaner cleaner toilet cleaner
## 0.0035587189 0.0050838841 0.0007117438
## bathroom cleaner hair spray dental care
## 0.0027452974 0.0011184545 0.0057956279
## male cosmetics make up remover skin care
## 0.0045754957 0.0008134215 0.0035587189
## female sanitary products baby cosmetics soap
## 0.0061006609 0.0006100661 0.0026436197
## rubbing alcohol hygiene articles napkins
## 0.0010167768 0.0329435689 0.0523640061
## dishes cookware kitchen utensil
## 0.0175902389 0.0027452974 0.0004067107
## cling film/bags kitchen towels house keeping products
## 0.0113879004 0.0059989832 0.0083375699
## candles light bulbs sound storage medium
## 0.0089476360 0.0041687850 0.0001016777
## newspapers photo/film pot plants
## 0.0798169802 0.0092526690 0.0172852059
## flower soil/fertilizer flower (seeds) shopping bags
## 0.0019318760 0.0103711235 0.0985256736
## bags
## 0.0004067107
看起來類別太過繁雜,因此改用 itemFrequencyPlot 指令繪出 Top10 產品。
itemFrequencyPlot(Groceries,topN = 10)
itemFrequencyPlot(Groceries,topN = 10,type = "absolute")
itemFrequencyPlot(Groceries,topN = 10,horiz = T,
main = "Item Frequency",xlab = "Relative Frequency")
顯示比例至少高於 0.1 產品所佔的比例圖, support 參數是支持度的意思 (通常會默認是 0.1,如果不使用的話,將列出所有產品品項,畫面會很亂。)
itemFrequencyPlot(Groceries,support = 0.1,
main = "Item Frequency with S = 0.1",ylab = "Relative Frequency")
接下來,我們嘗試運用 apriori (先驗演算法) 及 eclat (等價類變換演算法,Equivalence CLAss Transformation, Eclat) 函數 ,看看是否可以從資料中發掘一些有趣的結論:
apriori 演算法大致的運作方式,是首先透過設定 support 以及 confidence 兩個參數,再進一步觀察第三個參數 lift 與第四個參數 coverage:
支持度 (support):「規則」在資料內具有的普遍性,也就是這些 A 跟 B 同時出現的機率多少。 \(Support(A \longrightarrow B) = P(A,B)\)
信賴度 (confidence):「規則」要有一定的信心水準,也就是當購買 A 狀態下,也會購買 B 的條件機率。\(confidence(A \longrightarrow B) = P(B|A)=P(A,B)/P(A)\)
提升度 (lift):「規則」對於特定商品的存在要有一定的提升效果,也就是在產品 B 出現的可能基礎 \(P(B)\) 上,購買 A 狀態下也會購買 B 的條件機率 \(P(B|A)\) ( A 出現的前提下 B 的出現率) 的提升程度。\(lift(A \longrightarrow B) = P(B|A)/P(B) = confidence(A \longrightarrow B) / P(B)\)
覆蓋度 (coverage):「規則」對於特定商品的存在機率是否達到一定的標準,影響了關聯規則的適用性,又稱 LHS-support。也就是 A 產品的購買比例。 \(coverage(A) = P(A)\)
首先,嘗試對 apriori 函數做最少的限制,再依結果來決定該如何調整。支持度的最小閾值暫設定為 0.001 ,信賴度的最小閾值暫設定為 0.5 ,其他參數暫時不設定採預設值,並將所得之關聯規則命名為 rules0:
rules0=apriori(Groceries,parameter=list(support=0.001,confidence=0.5))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 9
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.01s].
## writing ... [5668 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
結果顯示 支持度、信賴度 最小值的參數詳解 (parameter specification) 以及計錄演算法執行過程中相關參數的演算法控制 (algorithmic control) 以及若干執行細節。
rules0 # 顯示 rules0 產生之關聯規則數目
## set of 5668 rules
inspect(rules0[1:10]) # 顯示其中前 10 條規則
## lhs rhs support confidence
## [1] {honey} => {whole milk} 0.001118454 0.7333333
## [2] {tidbits} => {rolls/buns} 0.001220132 0.5217391
## [3] {cocoa drinks} => {whole milk} 0.001321810 0.5909091
## [4] {pudding powder} => {whole milk} 0.001321810 0.5652174
## [5] {cooking chocolate} => {whole milk} 0.001321810 0.5200000
## [6] {cereals} => {whole milk} 0.003660397 0.6428571
## [7] {jam} => {whole milk} 0.002948653 0.5471698
## [8] {specialty cheese} => {other vegetables} 0.004270463 0.5000000
## [9] {rice} => {other vegetables} 0.003965430 0.5200000
## [10] {rice} => {whole milk} 0.004677173 0.6133333
## coverage lift count
## [1] 0.001525165 2.870009 11
## [2] 0.002338587 2.836542 12
## [3] 0.002236909 2.312611 13
## [4] 0.002338587 2.212062 13
## [5] 0.002541942 2.035097 13
## [6] 0.005693950 2.515917 36
## [7] 0.005388917 2.141431 29
## [8] 0.008540925 2.584078 42
## [9] 0.007625826 2.687441 39
## [10] 0.007625826 2.400371 46
可看出 rules0 中共包含 5668 條關聯規則,完整顯示這 5668 條關聯規則並沒有太大的意義。而且,透過觀察前 10 條規則,我們發現關聯規則的先後順序與其關聯強度的四個參數值 (support、confidence、lift、coverage) 的取值大小沒有明顯的關聯。
面對複雜混亂的大量訊息,較好的方法是針對生成規則進行強度控制,以生成關聯性較強的若干重要規則,以下透過幾種嘗試進行參數調整:
rules01=apriori(Groceries,parameter=list(support=0.005,confidence=0.5))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 49
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [120 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [120 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules01
## set of 120 rules
rules02=apriori(Groceries,parameter=list(support=0.005,confidence=0.6))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 49
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [120 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.01s].
## writing ... [22 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules02
## set of 22 rules
rules03=apriori(Groceries,parameter=list(support=0.005,confidence=0.64))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.64 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 49
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [120 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 done [0.01s].
## writing ... [4 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules03
## set of 4 rules
觀察一下關聯規則 rules03 規則的詳細內容:
inspect(rules03)
## lhs rhs support confidence coverage lift count
## [1] {butter,
## whipped/sour cream} => {whole milk} 0.006710727 0.6600000 0.010167768 2.583008 66
## [2] {pip fruit,
## whipped/sour cream} => {whole milk} 0.005998983 0.6483516 0.009252669 2.537421 59
## [3] {pip fruit,
## root vegetables,
## other vegetables} => {whole milk} 0.005490595 0.6750000 0.008134215 2.641713 54
## [4] {tropical fruit,
## root vegetables,
## yogurt} => {whole milk} 0.005693950 0.7000000 0.008134215 2.739554 56
此外,我們也可以透過將已形成的關聯規則 (例如 rules0) 之其中一個參數採取固定閾值,再依照其他參數來選擇前幾強的關聯規則,例如對於關聯規則 rules0 ,設定信賴度閾值為 0.5,並分別按 支持度、信賴度、提升度 排序,將結果計為 rules.sorted_supp、rules.sorted_conf、rules.sorted_lift,各選擇出前 6 強的關聯規則:
rules.sorted_supp = sort ( rules0, by="support" )
inspect ( rules.sorted_supp [1:6] )
## lhs rhs support
## [1] {other vegetables, yogurt} => {whole milk} 0.02226741
## [2] {tropical fruit, yogurt} => {whole milk} 0.01514997
## [3] {other vegetables, whipped/sour cream} => {whole milk} 0.01464159
## [4] {root vegetables, yogurt} => {whole milk} 0.01453991
## [5] {pip fruit, other vegetables} => {whole milk} 0.01352313
## [6] {root vegetables, yogurt} => {other vegetables} 0.01291307
## confidence coverage lift count
## [1] 0.5128806 0.04341637 2.007235 219
## [2] 0.5173611 0.02928317 2.024770 149
## [3] 0.5070423 0.02887646 1.984385 144
## [4] 0.5629921 0.02582613 2.203354 143
## [5] 0.5175097 0.02613116 2.025351 133
## [6] 0.5000000 0.02582613 2.584078 127
rules.sorted_conf = sort ( rules0, by="confidence" )
inspect ( rules.sorted_conf [1:6] )
## lhs rhs support confidence coverage lift count
## [1] {rice,
## sugar} => {whole milk} 0.001220132 1 0.001220132 3.913649 12
## [2] {canned fish,
## hygiene articles} => {whole milk} 0.001118454 1 0.001118454 3.913649 11
## [3] {root vegetables,
## butter,
## rice} => {whole milk} 0.001016777 1 0.001016777 3.913649 10
## [4] {root vegetables,
## whipped/sour cream,
## flour} => {whole milk} 0.001728521 1 0.001728521 3.913649 17
## [5] {butter,
## soft cheese,
## domestic eggs} => {whole milk} 0.001016777 1 0.001016777 3.913649 10
## [6] {citrus fruit,
## root vegetables,
## soft cheese} => {other vegetables} 0.001016777 1 0.001016777 5.168156 10
rules.sorted_lift = sort ( rules0, by="lift" )
inspect ( rules.sorted_lift [1:6] )
## lhs rhs support confidence coverage lift count
## [1] {Instant food products,
## soda} => {hamburger meat} 0.001220132 0.6315789 0.001931876 18.99565 12
## [2] {soda,
## popcorn} => {salty snack} 0.001220132 0.6315789 0.001931876 16.69779 12
## [3] {flour,
## baking powder} => {sugar} 0.001016777 0.5555556 0.001830198 16.40807 10
## [4] {ham,
## processed cheese} => {white bread} 0.001931876 0.6333333 0.003050330 15.04549 19
## [5] {whole milk,
## Instant food products} => {hamburger meat} 0.001525165 0.5000000 0.003050330 15.03823 15
## [6] {other vegetables,
## curd,
## yogurt,
## whipped/sour cream} => {cream cheese } 0.001016777 0.5882353 0.001728521 14.83409 10
由前面對於參數的概略介紹,我們可以知道提升度 (lift) 可說是篩選關聯規則的最可靠指標。且得到的結論也往往是有趣、富有啟發意味的 (例如強度最高的關聯規則為 {即時食品、碳酸飲料} \(\longrightarrow\) {漢堡肉},其次為 {碳酸飲料、爆米花} \(\longrightarrow\) {鹹零食(垃圾食物)}),甚至可以進一步猜想可能的消費情境:這是一群忙碌了一週又懶得好好下廚的消費者,打算利用週末好好放鬆,喝可樂汽水等碳酸飲料、吃洋芋片、爆米花等垃圾食物、餓了還可以簡單煎個漢堡肉塞肚子、或是乾脆吃泡麵來充飢…。
接下來透過例子談談關聯規則的實際應用。我們常在賣場裡發現過兩種商品綑綁在一起銷售的情況,這也許是因為商家想要促銷其中的某種商品。假設店家現在想要促銷一款較冷門的商品:芥末 (mustard),可以透過將函數 apriori 中的關聯結果 (rhs) 參數設定為 “mustard”,來搜尋出 rhs 僅包含 mustard 的關聯規則,藉此有效地找到 mustard 的強關聯規則商品,做為適合與其做商品綑綁的考量基礎。其中參數 maxlen 設定為 2 的意思是控制 lhs 中僅包含一種產品,畢竟目標是將兩種產品進行綑綁,而非一堆產品。
rules04=apriori(Groceries,parameter=list(maxlen=2,supp=0.001,conf=0.05),appearance=list(rhs="mustard",default="lhs"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.05 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 2 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 9
##
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2
## Warning in apriori(Groceries, parameter = list(maxlen = 2, supp = 0.001, :
## Mining stopped (maxlen reached). Only patterns up to a length of 2 returned!
## done [0.00s].
## writing ... [4 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
inspect ( rules04 )
## lhs rhs support confidence coverage
## [1] {mayonnaise} => {mustard} 0.001423488 0.15555556 0.009150991
## [2] {canned fish} => {mustard} 0.001016777 0.06756757 0.015048297
## [3] {pickled vegetables} => {mustard} 0.001016777 0.05681818 0.017895272
## [4] {oil} => {mustard} 0.001423488 0.05072464 0.028063040
## lift count
## [1] 12.965160 14
## [2] 5.631585 10
## [3] 4.735651 10
## [4] 4.227770 14
rules05=apriori(Groceries,parameter=list(maxlen=2,supp=0.001,conf=0.1),appearance=list(rhs="mustard",default="lhs"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.1 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 2 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 9
##
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2
## Warning in apriori(Groceries, parameter = list(maxlen = 2, supp = 0.001, :
## Mining stopped (maxlen reached). Only patterns up to a length of 2 returned!
## done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
inspect ( rules05 )
## lhs rhs support confidence coverage lift count
## [1] {mayonnaise} => {mustard} 0.001423488 0.1555556 0.009150991 12.96516 14
結果顯示 蛋黃醬 (mayonnaise) 是 芥末 (mustard) 的 (最) 強關聯規則商品,因此可以考慮將這兩種產品綑綁在一起銷售。
apriori 和 eclat 函數都可以根據需要,輸出頻繁項集 (frequent itemsets) 等其他形式的結果。譬如當我們想知道這個月銷售量最高的商品,或者綑綁策略在哪些綁定成對的商品中作用最顯著,則選擇輸出給定條件下的頻繁項集即可。
如以下即是將目標參數 (target) 設定為 “frequent itemsets” 所得的結果:
itemsets_apr = apriori ( Groceries, parameter = list (supp=0.001,target = "frequent itemsets"),control=list(sort=-1)) # 將 apriori 中目標參數設定為 frequent itemsets
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## NA 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 10 frequent itemsets TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE -1 TRUE
##
## Absolute minimum support count: 9
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.02s].
## sorting transactions ... done [0.00s].
## writing ... [13492 set(s)] done [0.00s].
## creating S4 object ... done [0.00s].
itemsets_apr # 顯示所生成頻繁項集的個數
## set of 13492 itemsets
inspect(itemsets_apr[1:5]) # 觀測前 5 個頻繁項集
## items support count
## [1] {whole milk} 0.2555160 2513
## [2] {other vegetables} 0.1934926 1903
## [3] {rolls/buns} 0.1839349 1809
## [4] {soda} 0.1743772 1715
## [5] {yogurt} 0.1395018 1372
如上結果,經 sort 參數對項集頻率進行降序排序 (由大到小) 後,得銷售量前 5 的商品分別為 全脂牛奶、蔬菜、麵包卷、碳酸飲料、優格。
接下來使用 eclat 函數來獲取最適合進行的綑綁銷售 (或者說最適合做相鄰擺放銷售) 的 5 對商品,例如輸出中的 “全脂牛奶、蜂蜜” 與 “全脂牛奶、可可飲料” (或 “全脂牛奶、布丁粉” ),作為共同出現最為頻繁的兩種商品,可以考慮綑綁或相鄰擺放的銷售策略。
itemsets_ecl = eclat( Groceries, parameter = list ( minlen=1, maxlen=3,supp=0.001, target = "frequent itemsets"),control=list(sort=-1))
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.001 1 3 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -1 TRUE
##
## Absolute minimum support count: 9
##
## create itemset ...
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating sparse bit matrix ... [157 row(s), 9835 column(s)] done [0.00s].
## writing ... [9969 set(s)] done [0.02s].
## Creating S4 object ... done [0.00s].
itemsets_ecl
## set of 9969 itemsets
inspect(itemsets_ecl[1:5])
## items support count
## [1] {whole milk, honey} 0.001118454 11
## [2] {whole milk, cocoa drinks} 0.001321810 13
## [3] {whole milk, pudding powder} 0.001321810 13
## [4] {tidbits, rolls/buns} 0.001220132 12
## [5] {tidbits, soda} 0.001016777 10