1 Market Basket Analysis
2 Source Code
> summary(Data)
transactions as itemMatrix in sparse format with
300 rows (elements/itemsets/transactions) and
140 columns (items) and a density of 0.0352381
most frequent items:
whole milk other vegetables soda rolls/buns
83 61 59 56
shopping bags (Other)
47 1174
element (itemset/transaction) length distribution:
sizes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 32
61 43 31 45 30 15 14 6 12 11 7 7 3 4 3 1 3 1 2 1
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 2.000 4.000 4.933 6.250 32.000
includes extended item information - examples:
labels
1 abrasive cleaner
2 baby cosmetics
3 baking powderBerdasarkan data yang digunakan, terdapat 300 transaksi dan 140 item yang dijual pada toko tersebut.
> apply(Data@data[,1:6], 2, function(r)
+
+ paste(Data@itemInfo[r,"labels"], collapse=","))
[1] "frozen vegetables,fruit/vegetable juice,kitchen towels,margarine,oil,other vegetables,tropical fruit,UHT-milk,whipped/sour cream"
[2] "baking powder,bottled water,butter,frankfurter,frozen vegetables,long life bakery product,misc. beverages,newspapers,rolls/buns,sugar,tropical fruit,whipped/sour cream"
[3] "coffee"
[4] "bottled beer,chewing gum,ham,rolls/buns"
[5] "bottled water,curd,other vegetables,pickled vegetables,yogurt"
[6] "butter milk,curd,sausage,whipped/sour cream,whole milk" Berikut merupakan 6 transaksi awal yang terjadi.
2.1 Frequent Itemsets
Apabila digunakan minimum support sebesar 0,02
> itemsets1 <- apriori(Data, parameter=list(minlen=1, maxlen=1,support=0.02,target="frequent itemsets"))
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen
NA 0.1 1 none FALSE TRUE 5 0.02 1
maxlen target ext
1 frequent itemsets TRUE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 6
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[140 item(s), 300 transaction(s)] done [0.00s].
sorting and recoding items ... [67 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 done [0.00s].
sorting transactions ... done [0.00s].
writing ... [67 set(s)] done [0.00s].
creating S4 object ... done [0.00s].
>
> inspect(head(sort(itemsets1,by="support"),10))
items support count
[1] {whole milk} 0.27666667 83
[2] {other vegetables} 0.20333333 61
[3] {soda} 0.19666667 59
[4] {rolls/buns} 0.18666667 56
[5] {shopping bags} 0.15666667 47
[6] {yogurt} 0.15000000 45
[7] {root vegetables} 0.13000000 39
[8] {tropical fruit} 0.10666667 32
[9] {bottled water} 0.10333333 31
[10] {whipped/sour cream} 0.09666667 29 Untuk frequent 1-itemsets, terdapat 67 transaksi yang nilai supportnya di atas 0,02.
> itemsets2<- apriori(Data, parameter=list(minlen=2, maxlen=2, support=0.02,target="frequent itemsets"))
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen
NA 0.1 1 none FALSE TRUE 5 0.02 2
maxlen target ext
2 frequent itemsets TRUE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 6
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[140 item(s), 300 transaction(s)] done [0.00s].
sorting and recoding items ... [67 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 done [0.00s].
sorting transactions ... done [0.00s].
writing ... [148 set(s)] done [0.00s].
creating S4 object ... done [0.00s].
>
> inspect(head(sort(itemsets2,by="support"),10))
items support count
[1] {other vegetables, whole milk} 0.09333333 28
[2] {root vegetables, whole milk} 0.08000000 24
[3] {tropical fruit, whole milk} 0.06333333 19
[4] {soda, whole milk} 0.06333333 19
[5] {rolls/buns, whole milk} 0.06000000 18
[6] {whole milk, yogurt} 0.05666667 17
[7] {other vegetables, rolls/buns} 0.05666667 17
[8] {whipped/sour cream, whole milk} 0.05333333 16
[9] {other vegetables, shopping bags} 0.05333333 16
[10] {other vegetables, root vegetables} 0.05333333 16 Untuk frequent 2-itemsets, terdapat 148 transaksi yang nilai supportnya di atas 0,02.
> itemsets3<- apriori(Data, parameter=list(minlen=3, maxlen=3, support=0.02,target="frequent itemsets"))
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen
NA 0.1 1 none FALSE TRUE 5 0.02 3
maxlen target ext
3 frequent itemsets TRUE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 6
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[140 item(s), 300 transaction(s)] done [0.00s].
sorting and recoding items ... [67 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
sorting transactions ... done [0.00s].
writing ... [33 set(s)] done [0.00s].
creating S4 object ... done [0.00s].
>
> inspect(head(sort(itemsets3,by="support"),10))
items support count
[1] {other vegetables, root vegetables, whole milk} 0.03666667 11
[2] {root vegetables, whole milk, yogurt} 0.03333333 10
[3] {butter, other vegetables, whole milk} 0.03000000 9
[4] {butter, root vegetables, whole milk} 0.02666667 8
[5] {root vegetables, whipped/sour cream, whole milk} 0.02666667 8
[6] {other vegetables, tropical fruit, whole milk} 0.02666667 8
[7] {rolls/buns, root vegetables, whole milk} 0.02666667 8
[8] {rolls/buns, whole milk, yogurt} 0.02666667 8
[9] {other vegetables, pork, whole milk} 0.02333333 7
[10] {citrus fruit, other vegetables, whole milk} 0.02333333 7 Untuk frequent 3-itemsets, terdapat 33 transaksi yang nilai supportnya di atas 0,02.
2.2 Confident
> rules<-apriori(Data, parameter=list(supp=0.02))
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen
0.8 0.1 1 none FALSE TRUE 5 0.02 1
maxlen target ext
10 rules TRUE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 6
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[140 item(s), 300 transaction(s)] done [0.00s].
sorting and recoding items ... [67 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [13 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
>
> inspect(head(sort(rules,by="confidence"),5))
lhs rhs support confidence
[1] {butter, fruit/vegetable juice} => {whole milk} 0.02000000 1.0000000
[2] {butter, other vegetables} => {whole milk} 0.03000000 1.0000000
[3] {rolls/buns, root vegetables, yogurt} => {whole milk} 0.02000000 1.0000000
[4] {root vegetables, whipped/sour cream} => {whole milk} 0.02666667 0.8888889
[5] {butter, whipped/sour cream} => {whole milk} 0.02333333 0.8750000
coverage lift count
[1] 0.02000000 3.614458 6
[2] 0.03000000 3.614458 9
[3] 0.02000000 3.614458 6
[4] 0.03000000 3.212851 8
[5] 0.02666667 3.162651 7 Output di atas merupakan 5 confident terbesar. Artinya, 100% orang yang membeli butter dan fruit/vegetable juice maka orang tersebut akan membeli whole milk juga.
2.3 Lift
> inspect(head(sort(rules,by="lift"),5))
lhs rhs support
[1] {onions, whole milk} => {root vegetables} 0.02
[2] {hard cheese} => {rolls/buns} 0.02
[3] {butter, fruit/vegetable juice} => {whole milk} 0.02
[4] {butter, other vegetables} => {whole milk} 0.03
[5] {rolls/buns, root vegetables, yogurt} => {whole milk} 0.02
confidence coverage lift count
[1] 0.8571429 0.02333333 6.593407 6
[2] 0.8571429 0.02333333 4.591837 6
[3] 1.0000000 0.02000000 3.614458 6
[4] 1.0000000 0.03000000 3.614458 9
[5] 1.0000000 0.02000000 3.614458 6 Output di atas merupakan 5 lift terbesar. Artinya, hubungan kuat antara onions dan whole milk terhadap root vegetables sebesar 6,593407.
2.4 Plot
> highLiftRules<-head(sort(rules,by="lift"),5)
> plot(highLiftRules,method="graph",
+ control=list(type="items",
+ edges = ggraph::geom_edge_link(
+ end_cap = ggraph::circle(4, "mm"),
+ start_cap = ggraph::circle(4, "mm"),
+ color = "black",
+ arrow = arrow(length = unit(2, "mm"), angle = 20, type = "closed"),
+ alpha = 0.3)))
Available control parameters (with default values):
layout = stress
circular = FALSE
ggraphdots = NULL
edges = <environment>
nodes = <environment>
nodetext = <environment>
colors = c("#EE0000FF", "#EEEEEEFF")
engine = ggplot2
max = 100
verbose = FALSE
Sekitar 2% pelanggan yang membeli hard cheese maka akan membeli
rolls/buns terlihat pada ukuran lingkaran dan hubungan kuat antara hard
cheese dan rolls/buns yang dapat dilihat pada lingkaran yang berwarna
merah.
Sekitar 2% pelanggan yang membeli rolls/buns, yogurt, dan root vegetables maka akan membeli whole milk dengan hubungan antar produk tidak terlalu kuat dikarenakan warna pada lingkaran berwarna abu-abu.
Sekitar 2% pelanggan yang membeli fruit/vegetable juice dan butter akan membeli whole milk dengan hubungan antar produk tidak terlalu kuat dikarenakan warna pada lingkaran berwarna abu-abu.
Sekitar 3% pelanggan yang membeli butter dan other vegetables akan membeli whole milk dengan hubungan antar produk tidak terlalu kuat dikarenakan warna pada lingkaran berwarna abu-abu.
Sekitar 2% pelanggan yang membeli whole milk dan onions akan membeli root vegetables dengan hubungan antar produk yang sangat kuat dikarenakan warna pada lingkaran berwarna merah pekat.