Data comes from kaggle. It consists of set of transactions from a grocery store. The aim of the study is to create a set of association rules to allow the grocery to maximize it’s profits.
library(arules)
library(arulesViz)
arules: A library for mining association rules in
large datasets.
arulesViz: A library for visualizing association rules
generated by the arules library.
df.ar <-
read.transactions("grocery.csv", format = "basket", sep = ",", header = T)
inspect(head(df.ar, 20))
## items
## [1] {citrus fruit,
## margarine,
## ready soups,
## semi-finished bread}
## [2] {coffee,
## tropical fruit,
## yogurt}
## [3] {whole milk}
## [4] {cream cheese,
## meat spreads,
## pip fruit,
## yogurt}
## [5] {condensed milk,
## long life bakery product,
## other vegetables,
## whole milk}
## [6] {abrasive cleaner,
## butter,
## rice,
## whole milk,
## yogurt}
## [7] {rolls/buns}
## [8] {bottled beer,
## liquor (appetizer),
## other vegetables,
## rolls/buns,
## UHT-milk}
## [9] {potted plants}
## [10] {cereals,
## whole milk}
## [11] {bottled water,
## chocolate,
## other vegetables,
## tropical fruit,
## white bread}
## [12] {bottled water,
## butter,
## citrus fruit,
## curd,
## dishes,
## flour,
## tropical fruit,
## whole milk,
## yogurt}
## [13] {beef}
## [14] {frankfurter,
## rolls/buns,
## soda}
## [15] {chicken,
## tropical fruit}
## [16] {butter,
## fruit/vegetable juice,
## newspapers,
## sugar}
## [17] {fruit/vegetable juice}
## [18] {packaged fruit/vegetables}
## [19] {chocolate}
## [20] {specialty bar}
Percentage:
round(itemFrequency(df.ar), 3)
## abrasive cleaner artif. sweetener baby cosmetics
## 0.004 0.003 0.001
## baby food bags baking powder
## 0.000 0.000 0.018
## bathroom cleaner beef berries
## 0.003 0.052 0.033
## beverages bottled beer bottled water
## 0.026 0.081 0.111
## brandy brown bread butter
## 0.004 0.065 0.055
## butter milk cake bar candles
## 0.028 0.013 0.009
## candy canned beer canned fish
## 0.030 0.078 0.015
## canned fruit canned vegetables cat food
## 0.003 0.011 0.023
## cereals chewing gum chicken
## 0.006 0.021 0.043
## chocolate chocolate marshmallow citrus fruit
## 0.050 0.009 0.083
## cleaner cling film/bags cocoa drinks
## 0.005 0.011 0.002
## coffee condensed milk cooking chocolate
## 0.058 0.010 0.003
## cookware cream cream cheese
## 0.003 0.001 0.040
## curd curd cheese decalcifier
## 0.053 0.005 0.002
## dental care dessert detergent
## 0.006 0.037 0.019
## dish cleaner dishes dog food
## 0.010 0.018 0.009
## domestic eggs female sanitary products finished products
## 0.063 0.006 0.007
## fish flour flower (seeds)
## 0.003 0.017 0.010
## flower soil/fertilizer frankfurter frozen chicken
## 0.002 0.059 0.001
## frozen dessert frozen fish frozen fruits
## 0.011 0.012 0.001
## frozen meals frozen potato products frozen vegetables
## 0.028 0.008 0.048
## fruit/vegetable juice grapes hair spray
## 0.072 0.022 0.001
## ham hamburger meat hard cheese
## 0.026 0.033 0.025
## herbs honey house keeping products
## 0.016 0.002 0.008
## hygiene articles ice cream instant coffee
## 0.033 0.025 0.007
## Instant food products jam ketchup
## 0.008 0.005 0.004
## kitchen towels kitchen utensil light bulbs
## 0.006 0.000 0.004
## liqueur liquor liquor (appetizer)
## 0.001 0.011 0.008
## liver loaf long life bakery product make up remover
## 0.005 0.037 0.001
## male cosmetics margarine mayonnaise
## 0.005 0.059 0.009
## meat meat spreads misc. beverages
## 0.026 0.004 0.028
## mustard napkins newspapers
## 0.012 0.052 0.080
## nut snack nuts/prunes oil
## 0.003 0.003 0.028
## onions organic products organic sausage
## 0.031 0.002 0.002
## other vegetables packaged fruit/vegetables pasta
## 0.193 0.013 0.015
## pastry pet care photo/film
## 0.089 0.009 0.009
## pickled vegetables pip fruit popcorn
## 0.018 0.076 0.007
## pork potato products potted plants
## 0.058 0.003 0.017
## preservation products processed cheese prosecco
## 0.000 0.017 0.002
## pudding powder ready soups red/blush wine
## 0.002 0.002 0.019
## rice roll products rolls/buns
## 0.008 0.010 0.184
## root vegetables rubbing alcohol rum
## 0.109 0.001 0.004
## salad dressing salt salty snack
## 0.001 0.011 0.038
## sauces sausage seasonal products
## 0.005 0.094 0.014
## semi-finished bread shopping bags skin care
## 0.018 0.099 0.004
## sliced cheese snack products soap
## 0.025 0.003 0.003
## soda soft cheese softener
## 0.174 0.017 0.005
## sound storage medium soups sparkling wine
## 0.000 0.007 0.006
## specialty bar specialty cheese specialty chocolate
## 0.027 0.009 0.030
## specialty fat specialty vegetables spices
## 0.004 0.002 0.005
## spread cheese sugar sweet spreads
## 0.011 0.034 0.009
## syrup tea tidbits
## 0.003 0.004 0.002
## toilet cleaner tropical fruit turkey
## 0.001 0.105 0.008
## UHT-milk vinegar waffles
## 0.033 0.007 0.038
## whipped/sour cream whisky white bread
## 0.072 0.001 0.042
## white wine whole milk yogurt
## 0.019 0.256 0.140
## zwieback
## 0.007
Absolute:
itemFrequency(df.ar, type = "absolute")
## abrasive cleaner artif. sweetener baby cosmetics
## 35 32 6
## baby food bags baking powder
## 1 4 174
## bathroom cleaner beef berries
## 27 516 327
## beverages bottled beer bottled water
## 256 792 1087
## brandy brown bread butter
## 41 638 545
## butter milk cake bar candles
## 275 130 88
## candy canned beer canned fish
## 294 764 148
## canned fruit canned vegetables cat food
## 32 106 229
## cereals chewing gum chicken
## 56 207 422
## chocolate chocolate marshmallow citrus fruit
## 488 89 814
## cleaner cling film/bags cocoa drinks
## 50 112 22
## coffee condensed milk cooking chocolate
## 571 101 25
## cookware cream cream cheese
## 27 13 390
## curd curd cheese decalcifier
## 524 50 15
## dental care dessert detergent
## 57 365 189
## dish cleaner dishes dog food
## 103 173 84
## domestic eggs female sanitary products finished products
## 624 60 64
## fish flour flower (seeds)
## 29 171 102
## flower soil/fertilizer frankfurter frozen chicken
## 19 580 6
## frozen dessert frozen fish frozen fruits
## 106 115 12
## frozen meals frozen potato products frozen vegetables
## 279 83 473
## fruit/vegetable juice grapes hair spray
## 711 220 11
## ham hamburger meat hard cheese
## 256 327 241
## herbs honey house keeping products
## 160 15 82
## hygiene articles ice cream instant coffee
## 324 246 73
## Instant food products jam ketchup
## 79 53 42
## kitchen towels kitchen utensil light bulbs
## 59 4 41
## liqueur liquor liquor (appetizer)
## 9 109 78
## liver loaf long life bakery product make up remover
## 50 368 8
## male cosmetics margarine mayonnaise
## 45 576 90
## meat meat spreads misc. beverages
## 254 42 279
## mustard napkins newspapers
## 118 515 785
## nut snack nuts/prunes oil
## 31 33 276
## onions organic products organic sausage
## 305 16 22
## other vegetables packaged fruit/vegetables pasta
## 1903 128 148
## pastry pet care photo/film
## 875 93 91
## pickled vegetables pip fruit popcorn
## 176 744 71
## pork potato products potted plants
## 567 28 170
## preservation products processed cheese prosecco
## 2 163 20
## pudding powder ready soups red/blush wine
## 23 18 189
## rice roll products rolls/buns
## 75 101 1809
## root vegetables rubbing alcohol rum
## 1072 10 44
## salad dressing salt salty snack
## 8 106 372
## sauces sausage seasonal products
## 54 924 140
## semi-finished bread shopping bags skin care
## 174 969 35
## sliced cheese snack products soap
## 241 30 26
## soda soft cheese softener
## 1715 168 54
## sound storage medium soups sparkling wine
## 1 67 55
## specialty bar specialty cheese specialty chocolate
## 269 84 299
## specialty fat specialty vegetables spices
## 36 17 51
## spread cheese sugar sweet spreads
## 110 333 89
## syrup tea tidbits
## 32 38 23
## toilet cleaner tropical fruit turkey
## 7 1032 80
## UHT-milk vinegar waffles
## 329 64 378
## whipped/sour cream whisky white bread
## 705 8 414
## white wine whole milk yogurt
## 187 2513 1372
## zwieback
## 68
Only first 3 products are shown due to amount of data
ctab <- crossTable(df.ar, sort = TRUE)
ctab[1:3,]
## whole milk other vegetables rolls/buns soda yogurt
## whole milk 2513 736 557 394 551
## other vegetables 736 1903 419 322 427
## rolls/buns 557 419 1809 377 338
## bottled water root vegetables tropical fruit shopping bags
## whole milk 338 481 416 241
## other vegetables 244 466 353 228
## rolls/buns 238 239 242 192
## sausage pastry citrus fruit bottled beer newspapers
## whole milk 294 327 300 201 269
## other vegetables 265 222 284 159 190
## rolls/buns 301 206 165 134 194
## canned beer pip fruit fruit/vegetable juice whipped/sour cream
## whole milk 87 296 262 317
## other vegetables 89 257 207 284
## rolls/buns 111 137 143 144
## brown bread domestic eggs frankfurter margarine coffee pork
## whole milk 248 295 202 238 184 218
## other vegetables 184 219 162 194 132 213
## rolls/buns 124 154 189 145 108 111
## butter curd beef napkins chocolate frozen vegetables chicken
## whole milk 271 257 209 194 164 201 173
## other vegetables 197 169 194 142 125 175 176
## rolls/buns 132 99 134 115 116 100 95
## white bread cream cheese waffles salty snack
## whole milk 168 162 125 110
## other vegetables 135 135 99 106
## rolls/buns 64 98 90 49
## long life bakery product dessert sugar UHT-milk berries
## whole milk 133 135 148 39 116
## other vegetables 105 114 106 80 101
## rolls/buns 78 67 69 63 65
## hamburger meat hygiene articles onions specialty chocolate
## whole milk 145 126 119 79
## other vegetables 136 94 140 60
## rolls/buns 85 58 67 55
## candy frozen meals misc. beverages oil butter milk
## whole milk 81 97 69 111 114
## other vegetables 68 74 55 98 102
## rolls/buns 70 48 46 50 75
## specialty bar beverages ham meat ice cream hard cheese
## whole milk 64 67 113 98 58 99
## other vegetables 55 51 90 98 50 93
## rolls/buns 55 53 68 68 33 58
## sliced cheese cat food grapes chewing gum detergent
## whole milk 106 87 72 50 88
## other vegetables 89 64 89 45 63
## rolls/buns 75 39 47 38 30
## red/blush wine white wine pickled vegetables baking powder
## whole milk 39 26 70 91
## other vegetables 49 22 63 72
## rolls/buns 39 26 42 35
## semi-finished bread dishes flour potted plants soft cheese
## whole milk 70 52 83 68 74
## other vegetables 51 59 62 43 70
## rolls/buns 34 32 37 28 53
## processed cheese herbs canned fish pasta seasonal products
## whole milk 69 76 47 60 37
## other vegetables 54 76 50 42 36
## rolls/buns 46 30 40 24 27
## cake bar packaged fruit/vegetables mustard frozen fish
## whole milk 55 39 51 49
## other vegetables 37 31 32 46
## rolls/buns 31 26 42 24
## cling film/bags spread cheese liquor canned vegetables
## whole milk 36 31 6 38
## other vegetables 32 30 13 46
## rolls/buns 19 41 11 21
## frozen dessert salt dish cleaner flower (seeds) condensed milk
## whole milk 39 38 29 39 24
## other vegetables 36 36 23 37 25
## rolls/buns 29 21 22 18 22
## roll products pet care photo/film mayonnaise
## whole milk 46 26 23 33
## other vegetables 47 19 11 35
## rolls/buns 21 9 11 27
## chocolate marshmallow sweet spreads candles dog food
## whole milk 31 35 30 29
## other vegetables 20 24 23 22
## rolls/buns 22 16 13 13
## specialty cheese frozen potato products house keeping products
## whole milk 37 34 38
## other vegetables 42 26 27
## rolls/buns 11 20 17
## turkey Instant food products liquor (appetizer) rice
## whole milk 36 30 16 46
## other vegetables 39 27 14 39
## rolls/buns 19 23 9 15
## instant coffee popcorn zwieback soups finished products
## whole milk 22 26 17 29 14
## other vegetables 19 16 17 31 20
## rolls/buns 18 11 14 13 9
## vinegar female sanitary products kitchen towels dental care
## whole milk 26 20 27 18
## other vegetables 24 14 22 20
## rolls/buns 16 16 12 14
## cereals sparkling wine sauces softener jam spices cleaner
## whole milk 36 10 21 21 29 14 23
## other vegetables 20 15 15 16 18 19 16
## rolls/buns 11 6 8 9 13 11 7
## curd cheese liver loaf male cosmetics rum ketchup meat spreads
## whole milk 23 21 8 17 15 13
## other vegetables 21 15 8 15 15 9
## rolls/buns 16 15 7 9 5 13
## brandy light bulbs tea specialty fat abrasive cleaner
## whole milk 5 9 16 12 16
## other vegetables 6 13 15 11 16
## rolls/buns 11 7 11 7 5
## skin care nuts/prunes artif. sweetener canned fruit syrup
## whole milk 16 12 11 13 9
## other vegetables 12 9 10 11 11
## rolls/buns 14 10 7 6 6
## nut snack snack products fish potato products bathroom cleaner
## whole milk 5 8 9 12 6
## other vegetables 8 9 7 8 10
## rolls/buns 6 11 3 6 6
## cookware soap cooking chocolate pudding powder tidbits
## whole milk 4 11 13 13 9
## other vegetables 5 5 7 8 4
## rolls/buns 6 6 6 2 12
## cocoa drinks organic sausage prosecco flower soil/fertilizer
## whole milk 13 9 5 2
## other vegetables 5 5 3 2
## rolls/buns 5 5 5 2
## ready soups specialty vegetables organic products decalcifier
## whole milk 8 3 4 7
## other vegetables 6 7 6 5
## rolls/buns 9 2 4 3
## honey cream frozen fruits hair spray rubbing alcohol liqueur
## whole milk 11 4 3 3 6 2
## other vegetables 3 7 8 2 4 1
## rolls/buns 4 1 2 2 1 3
## make up remover salad dressing whisky toilet cleaner
## whole milk 2 1 1 2
## other vegetables 1 5 2 2
## rolls/buns 2 1 2 0
## baby cosmetics frozen chicken bags kitchen utensil
## whole milk 3 2 1 3
## other vegetables 1 0 0 1
## rolls/buns 2 1 1 1
## preservation products baby food sound storage medium
## whole milk 1 0 0
## other vegetables 1 1 0
## rolls/buns 0 1 0
base.elcat <- eclat(df.ar, parameter = list(supp = 0.05, maxlen = 15))
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.05 1 15 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 491
##
## create itemset ...
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [28 item(s)] done [0.00s].
## creating sparse bit matrix ... [28 row(s), 9835 column(s)] done [0.00s].
## writing ... [31 set(s)] done [0.00s].
## Creating S4 object ... done [0.00s].
inspect(base.elcat)
## items support count
## [1] {whole milk, yogurt} 0.05602440 551
## [2] {rolls/buns, whole milk} 0.05663447 557
## [3] {other vegetables, whole milk} 0.07483477 736
## [4] {whole milk} 0.25551601 2513
## [5] {other vegetables} 0.19349263 1903
## [6] {rolls/buns} 0.18393493 1809
## [7] {yogurt} 0.13950178 1372
## [8] {soda} 0.17437722 1715
## [9] {root vegetables} 0.10899847 1072
## [10] {tropical fruit} 0.10493137 1032
## [11] {bottled water} 0.11052364 1087
## [12] {sausage} 0.09395018 924
## [13] {shopping bags} 0.09852567 969
## [14] {citrus fruit} 0.08276563 814
## [15] {pastry} 0.08896797 875
## [16] {pip fruit} 0.07564820 744
## [17] {whipped/sour cream} 0.07168277 705
## [18] {fruit/vegetable juice} 0.07229283 711
## [19] {domestic eggs} 0.06344687 624
## [20] {newspapers} 0.07981698 785
## [21] {butter} 0.05541434 545
## [22] {margarine} 0.05856634 576
## [23] {brown bread} 0.06487036 638
## [24] {bottled beer} 0.08052872 792
## [25] {frankfurter} 0.05897306 580
## [26] {pork} 0.05765125 567
## [27] {napkins} 0.05236401 515
## [28] {curd} 0.05327911 524
## [29] {beef} 0.05246568 516
## [30] {coffee} 0.05805796 571
## [31] {canned beer} 0.07768175 764
Rules:
freq.rules <- ruleInduction(base.elcat, df.ar, confidence = 0.2)
freq.rules
## set of 6 rules
inspect(freq.rules)
## lhs rhs support confidence lift
## [1] {yogurt} => {whole milk} 0.05602440 0.4016035 1.571735
## [2] {whole milk} => {yogurt} 0.05602440 0.2192598 1.571735
## [3] {whole milk} => {rolls/buns} 0.05663447 0.2216474 1.205032
## [4] {rolls/buns} => {whole milk} 0.05663447 0.3079049 1.205032
## [5] {whole milk} => {other vegetables} 0.07483477 0.2928770 1.513634
## [6] {other vegetables} => {whole milk} 0.07483477 0.3867578 1.513634
## itemset
## [1] 1
## [2] 1
## [3] 2
## [4] 2
## [5] 3
## [6] 3
Rules by confidence:
base.apriori <- apriori(df.ar, parameter = list(supp = 0.02, conf = 0.2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.2 0.1 1 none FALSE TRUE 5 0.02 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 196
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [59 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [73 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules.by.conf <-
sort(base.apriori, by = "confidence", decreasing = TRUE)
inspect(head(rules.by.conf))
## lhs rhs support
## [1] {other vegetables, yogurt} => {whole milk} 0.02226741
## [2] {butter} => {whole milk} 0.02755465
## [3] {curd} => {whole milk} 0.02613116
## [4] {other vegetables, root vegetables} => {whole milk} 0.02318251
## [5] {root vegetables, whole milk} => {other vegetables} 0.02318251
## [6] {domestic eggs} => {whole milk} 0.02999492
## confidence coverage lift count
## [1] 0.5128806 0.04341637 2.007235 219
## [2] 0.4972477 0.05541434 1.946053 271
## [3] 0.4904580 0.05327911 1.919481 257
## [4] 0.4892704 0.04738180 1.914833 228
## [5] 0.4740125 0.04890696 2.449770 228
## [6] 0.4727564 0.06344687 1.850203 295
Rules by lift
rules.by.lift <- sort(base.apriori, by = "lift", decreasing = TRUE)
inspect(head(rules.by.lift))
## lhs rhs support confidence
## [1] {other vegetables, whole milk} => {root vegetables} 0.02318251 0.3097826
## [2] {pip fruit} => {tropical fruit} 0.02043721 0.2701613
## [3] {root vegetables, whole milk} => {other vegetables} 0.02318251 0.4740125
## [4] {root vegetables} => {other vegetables} 0.04738180 0.4347015
## [5] {other vegetables} => {root vegetables} 0.04738180 0.2448765
## [6] {other vegetables, whole milk} => {yogurt} 0.02226741 0.2975543
## coverage lift count
## [1] 0.07483477 2.842082 228
## [2] 0.07564820 2.574648 201
## [3] 0.04890696 2.449770 228
## [4] 0.10899847 2.246605 466
## [5] 0.19349263 2.246605 466
## [6] 0.07483477 2.132979 219
rules.tropical <-
apriori(
data = df.ar,
parameter = list(supp = 0.001, conf = 0.08),
appearance = list(default = "lhs", rhs = "tropical fruit"),
control = list(verbose = F)
)
rules.tropical
## set of 1795 rules
rules.tropical.byconf<-sort(rules.tropical, by="confidence", decreasing=TRUE)
inspect(head(rules.tropical.byconf))
## lhs rhs support confidence coverage lift count
## [1] {citrus fruit,
## fruit/vegetable juice,
## grapes} => {tropical fruit} 0.001118454 0.8461538 0.001321810 8.063879 11
## [2] {ham,
## other vegetables,
## pip fruit,
## yogurt} => {tropical fruit} 0.001016777 0.8333333 0.001220132 7.941699 10
## [3] {fruit/vegetable juice,
## grapes,
## other vegetables} => {tropical fruit} 0.001118454 0.7857143 0.001423488 7.487888 11
## [4] {bottled water,
## other vegetables,
## root vegetables,
## whole milk,
## yogurt} => {tropical fruit} 0.001118454 0.7857143 0.001423488 7.487888 11
## [5] {butter,
## domestic eggs,
## other vegetables,
## whole milk,
## yogurt} => {tropical fruit} 0.001016777 0.7692308 0.001321810 7.330799 10
## [6] {ham,
## other vegetables,
## pip fruit,
## whole milk} => {tropical fruit} 0.001118454 0.7333333 0.001525165 6.988695 11
plot(rules.tropical)
plot(rules.tropical, method="graph")
## Warning: Too many rules supplied. Only plotting the best 100 using
## 'lift' (change control parameter max if needed).
plot(rules.tropical, method="grouped matrix")
In conclusion, the data for this project comes from a grocery store transactions dataset on Kaggle. The aim of the study is to create association rules to help the grocery store maximize profits. The R libraries “arules” and “arulesViz” were used for mining association rules and visualizing the results, respectively. Exploratory data analysis was performed to inspect the transaction dataset and generate item frequencies. Eclat and Apriori algorithms were applied to generate association rules, and the results were sorted by confidence and lift. Finally, the association rules were visualized using various methods including a plot, graph, and grouped matrix.