The aim of the project is to perform association rules technique called Apriori algorithm to see buying tendencies of customers. In the project the usage of association rules determinants like support (how many times a product appears in the data set), lift (which determiens the relationship between products, above 1 means positive correlation) and confidence(which is basically probability)
library(arules)
## Warning: package 'arules' was built under R version 4.3.2
## Loading required package: Matrix
## Warning: package 'Matrix' was built under R version 4.3.2
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(arulesViz)
## Warning: package 'arulesViz' was built under R version 4.3.2
library(arulesCBA)
## Warning: package 'arulesCBA' was built under R version 4.3.2
## [1] "C:/Users/Piotr/Downloads"
Gro<-read.transactions("market basket groceries/groceries.csv",format="basket",sep=",")
Let’s see how the dataset looks
summary(Gro)
## transactions as itemMatrix in sparse format with
## 9835 rows (elements/itemsets/transactions) and
## 169 columns (items) and a density of 0.02609146
##
## most frequent items:
## whole milk other vegetables rolls/buns soda
## 2513 1903 1809 1715
## yogurt (Other)
## 1372 34055
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 2159 1643 1299 1005 855 645 545 438 350 246 182 117 78 77 55 46
## 17 18 19 20 21 22 23 24 26 27 28 29 32
## 29 14 14 9 11 4 6 1 1 1 1 3 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 4.409 6.000 32.000
##
## includes extended item information - examples:
## labels
## 1 abrasive cleaner
## 2 artif. sweetener
## 3 baby cosmetics
We have 9835 rows with 169 columns
print(sum(size(Gro)))
## [1] 43367
For a total of 43367 products
itemFrequency(Gro,type="absolute")
## abrasive cleaner artif. sweetener baby cosmetics
## 35 32 6
## baby food bags baking powder
## 1 4 174
## bathroom cleaner beef berries
## 27 516 327
## beverages bottled beer bottled water
## 256 792 1087
## brandy brown bread butter
## 41 638 545
## butter milk cake bar candles
## 275 130 88
## candy canned beer canned fish
## 294 764 148
## canned fruit canned vegetables cat food
## 32 106 229
## cereals chewing gum chicken
## 56 207 422
## chocolate chocolate marshmallow citrus fruit
## 488 89 814
## cleaner cling film/bags cocoa drinks
## 50 112 22
## coffee condensed milk cooking chocolate
## 571 101 25
## cookware cream cream cheese
## 27 13 390
## curd curd cheese decalcifier
## 524 50 15
## dental care dessert detergent
## 57 365 189
## dish cleaner dishes dog food
## 103 173 84
## domestic eggs female sanitary products finished products
## 624 60 64
## fish flour flower (seeds)
## 29 171 102
## flower soil/fertilizer frankfurter frozen chicken
## 19 580 6
## frozen dessert frozen fish frozen fruits
## 106 115 12
## frozen meals frozen potato products frozen vegetables
## 279 83 473
## fruit/vegetable juice grapes hair spray
## 711 220 11
## ham hamburger meat hard cheese
## 256 327 241
## herbs honey house keeping products
## 160 15 82
## hygiene articles ice cream instant coffee
## 324 246 73
## Instant food products jam ketchup
## 79 53 42
## kitchen towels kitchen utensil light bulbs
## 59 4 41
## liqueur liquor liquor (appetizer)
## 9 109 78
## liver loaf long life bakery product make up remover
## 50 368 8
## male cosmetics margarine mayonnaise
## 45 576 90
## meat meat spreads misc. beverages
## 254 42 279
## mustard napkins newspapers
## 118 515 785
## nut snack nuts/prunes oil
## 31 33 276
## onions organic products organic sausage
## 305 16 22
## other vegetables packaged fruit/vegetables pasta
## 1903 128 148
## pastry pet care photo/film
## 875 93 91
## pickled vegetables pip fruit popcorn
## 176 744 71
## pork potato products potted plants
## 567 28 170
## preservation products processed cheese prosecco
## 2 163 20
## pudding powder ready soups red/blush wine
## 23 18 189
## rice roll products rolls/buns
## 75 101 1809
## root vegetables rubbing alcohol rum
## 1072 10 44
## salad dressing salt salty snack
## 8 106 372
## sauces sausage seasonal products
## 54 924 140
## semi-finished bread shopping bags skin care
## 174 969 35
## sliced cheese snack products soap
## 241 30 26
## soda soft cheese softener
## 1715 168 54
## sound storage medium soups sparkling wine
## 1 67 55
## specialty bar specialty cheese specialty chocolate
## 269 84 299
## specialty fat specialty vegetables spices
## 36 17 51
## spread cheese sugar sweet spreads
## 110 333 89
## syrup tea tidbits
## 32 38 23
## toilet cleaner tropical fruit turkey
## 7 1032 80
## UHT-milk vinegar waffles
## 329 64 378
## whipped/sour cream whisky white bread
## 705 8 414
## white wine whole milk yogurt
## 187 2513 1372
## zwieback
## 68
itemFrequency(Gro,type="relative")
## abrasive cleaner artif. sweetener baby cosmetics
## 0.0035587189 0.0032536858 0.0006100661
## baby food bags baking powder
## 0.0001016777 0.0004067107 0.0176919166
## bathroom cleaner beef berries
## 0.0027452974 0.0524656838 0.0332486019
## beverages bottled beer bottled water
## 0.0260294865 0.0805287239 0.1105236401
## brandy brown bread butter
## 0.0041687850 0.0648703610 0.0554143366
## butter milk cake bar candles
## 0.0279613625 0.0132180986 0.0089476360
## candy canned beer canned fish
## 0.0298932384 0.0776817489 0.0150482969
## canned fruit canned vegetables cat food
## 0.0032536858 0.0107778343 0.0232841891
## cereals chewing gum chicken
## 0.0056939502 0.0210472801 0.0429079817
## chocolate chocolate marshmallow citrus fruit
## 0.0496187087 0.0090493137 0.0827656329
## cleaner cling film/bags cocoa drinks
## 0.0050838841 0.0113879004 0.0022369090
## coffee condensed milk cooking chocolate
## 0.0580579563 0.0102694459 0.0025419420
## cookware cream cream cheese
## 0.0027452974 0.0013218099 0.0396542959
## curd curd cheese decalcifier
## 0.0532791052 0.0050838841 0.0015251652
## dental care dessert detergent
## 0.0057956279 0.0371123538 0.0192170819
## dish cleaner dishes dog food
## 0.0104728012 0.0175902389 0.0085409253
## domestic eggs female sanitary products finished products
## 0.0634468734 0.0061006609 0.0065073716
## fish flour flower (seeds)
## 0.0029486528 0.0173868836 0.0103711235
## flower soil/fertilizer frankfurter frozen chicken
## 0.0019318760 0.0589730554 0.0006100661
## frozen dessert frozen fish frozen fruits
## 0.0107778343 0.0116929334 0.0012201322
## frozen meals frozen potato products frozen vegetables
## 0.0283680732 0.0084392476 0.0480935435
## fruit/vegetable juice grapes hair spray
## 0.0722928317 0.0223690900 0.0011184545
## ham hamburger meat hard cheese
## 0.0260294865 0.0332486019 0.0245043213
## herbs honey house keeping products
## 0.0162684291 0.0015251652 0.0083375699
## hygiene articles ice cream instant coffee
## 0.0329435689 0.0250127097 0.0074224708
## Instant food products jam ketchup
## 0.0080325369 0.0053889171 0.0042704626
## kitchen towels kitchen utensil light bulbs
## 0.0059989832 0.0004067107 0.0041687850
## liqueur liquor liquor (appetizer)
## 0.0009150991 0.0110828673 0.0079308592
## liver loaf long life bakery product make up remover
## 0.0050838841 0.0374173869 0.0008134215
## male cosmetics margarine mayonnaise
## 0.0045754957 0.0585663447 0.0091509914
## meat meat spreads misc. beverages
## 0.0258261312 0.0042704626 0.0283680732
## mustard napkins newspapers
## 0.0119979664 0.0523640061 0.0798169802
## nut snack nuts/prunes oil
## 0.0031520081 0.0033553635 0.0280630402
## onions organic products organic sausage
## 0.0310116929 0.0016268429 0.0022369090
## other vegetables packaged fruit/vegetables pasta
## 0.1934926284 0.0130147433 0.0150482969
## pastry pet care photo/film
## 0.0889679715 0.0094560244 0.0092526690
## pickled vegetables pip fruit popcorn
## 0.0178952720 0.0756481952 0.0072191154
## pork potato products potted plants
## 0.0576512456 0.0028469751 0.0172852059
## preservation products processed cheese prosecco
## 0.0002033554 0.0165734621 0.0020335536
## pudding powder ready soups red/blush wine
## 0.0023385867 0.0018301983 0.0192170819
## rice roll products rolls/buns
## 0.0076258261 0.0102694459 0.1839349263
## root vegetables rubbing alcohol rum
## 0.1089984748 0.0010167768 0.0044738180
## salad dressing salt salty snack
## 0.0008134215 0.0107778343 0.0378240976
## sauces sausage seasonal products
## 0.0054905948 0.0939501779 0.0142348754
## semi-finished bread shopping bags skin care
## 0.0176919166 0.0985256736 0.0035587189
## sliced cheese snack products soap
## 0.0245043213 0.0030503305 0.0026436197
## soda soft cheese softener
## 0.1743772242 0.0170818505 0.0054905948
## sound storage medium soups sparkling wine
## 0.0001016777 0.0068124047 0.0055922725
## specialty bar specialty cheese specialty chocolate
## 0.0273512964 0.0085409253 0.0304016268
## specialty fat specialty vegetables spices
## 0.0036603965 0.0017285206 0.0051855618
## spread cheese sugar sweet spreads
## 0.0111845450 0.0338586680 0.0090493137
## syrup tea tidbits
## 0.0032536858 0.0038637519 0.0023385867
## toilet cleaner tropical fruit turkey
## 0.0007117438 0.1049313676 0.0081342145
## UHT-milk vinegar waffles
## 0.0334519573 0.0065073716 0.0384341637
## whipped/sour cream whisky white bread
## 0.0716827656 0.0008134215 0.0420945602
## white wine whole milk yogurt
## 0.0190137265 0.2555160142 0.1395017794
## zwieback
## 0.0069140824
What is interesting to see from the output is that whole milk is responsible for about 25% of the purchases
itemFrequencyPlot(Gro, topN=15, type="relative")
itemFrequencyPlot(Gro, topN=15, type="absolute")
As mentioned before, the most common bought product is whole milk, followed by other vegetables, rolls/buns, then soda and yogurt - these consist of top 5 in terms of frequency.
We can move forward to our algorithm - we will use the apriori method
We will set support to 1% and confidence to 50%
groceryrules <- apriori(Gro, parameter = list(support = 0.01, confidence = 0.5, minlen = 2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.01 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 98
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [15 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
groceryrules
## set of 15 rules
From performing the Apriori algorithm we got 15 rules, which we will now inspect and draw conclusions from
inspect(groceryrules[1:15])
## lhs rhs support
## [1] {curd, yogurt} => {whole milk} 0.01006609
## [2] {butter, other vegetables} => {whole milk} 0.01148958
## [3] {domestic eggs, other vegetables} => {whole milk} 0.01230300
## [4] {whipped/sour cream, yogurt} => {whole milk} 0.01087951
## [5] {other vegetables, whipped/sour cream} => {whole milk} 0.01464159
## [6] {other vegetables, pip fruit} => {whole milk} 0.01352313
## [7] {citrus fruit, root vegetables} => {other vegetables} 0.01037112
## [8] {root vegetables, tropical fruit} => {other vegetables} 0.01230300
## [9] {root vegetables, tropical fruit} => {whole milk} 0.01199797
## [10] {tropical fruit, yogurt} => {whole milk} 0.01514997
## [11] {root vegetables, yogurt} => {other vegetables} 0.01291307
## [12] {root vegetables, yogurt} => {whole milk} 0.01453991
## [13] {rolls/buns, root vegetables} => {other vegetables} 0.01220132
## [14] {rolls/buns, root vegetables} => {whole milk} 0.01270971
## [15] {other vegetables, yogurt} => {whole milk} 0.02226741
## confidence coverage lift count
## [1] 0.5823529 0.01728521 2.279125 99
## [2] 0.5736041 0.02003050 2.244885 113
## [3] 0.5525114 0.02226741 2.162336 121
## [4] 0.5245098 0.02074225 2.052747 107
## [5] 0.5070423 0.02887646 1.984385 144
## [6] 0.5175097 0.02613116 2.025351 133
## [7] 0.5862069 0.01769192 3.029608 102
## [8] 0.5845411 0.02104728 3.020999 121
## [9] 0.5700483 0.02104728 2.230969 118
## [10] 0.5173611 0.02928317 2.024770 149
## [11] 0.5000000 0.02582613 2.584078 127
## [12] 0.5629921 0.02582613 2.203354 143
## [13] 0.5020921 0.02430097 2.594890 120
## [14] 0.5230126 0.02430097 2.046888 125
## [15] 0.5128806 0.04341637 2.007235 219
Let’s reorder the rules we got by confidence, lift and support to see the most important purchases
sorted_rules1 <- sort(groceryrules, by = "confidence")
inspect(sorted_rules1[1:15])
## lhs rhs support
## [1] {citrus fruit, root vegetables} => {other vegetables} 0.01037112
## [2] {root vegetables, tropical fruit} => {other vegetables} 0.01230300
## [3] {curd, yogurt} => {whole milk} 0.01006609
## [4] {butter, other vegetables} => {whole milk} 0.01148958
## [5] {root vegetables, tropical fruit} => {whole milk} 0.01199797
## [6] {root vegetables, yogurt} => {whole milk} 0.01453991
## [7] {domestic eggs, other vegetables} => {whole milk} 0.01230300
## [8] {whipped/sour cream, yogurt} => {whole milk} 0.01087951
## [9] {rolls/buns, root vegetables} => {whole milk} 0.01270971
## [10] {other vegetables, pip fruit} => {whole milk} 0.01352313
## [11] {tropical fruit, yogurt} => {whole milk} 0.01514997
## [12] {other vegetables, yogurt} => {whole milk} 0.02226741
## [13] {other vegetables, whipped/sour cream} => {whole milk} 0.01464159
## [14] {rolls/buns, root vegetables} => {other vegetables} 0.01220132
## [15] {root vegetables, yogurt} => {other vegetables} 0.01291307
## confidence coverage lift count
## [1] 0.5862069 0.01769192 3.029608 102
## [2] 0.5845411 0.02104728 3.020999 121
## [3] 0.5823529 0.01728521 2.279125 99
## [4] 0.5736041 0.02003050 2.244885 113
## [5] 0.5700483 0.02104728 2.230969 118
## [6] 0.5629921 0.02582613 2.203354 143
## [7] 0.5525114 0.02226741 2.162336 121
## [8] 0.5245098 0.02074225 2.052747 107
## [9] 0.5230126 0.02430097 2.046888 125
## [10] 0.5175097 0.02613116 2.025351 133
## [11] 0.5173611 0.02928317 2.024770 149
## [12] 0.5128806 0.04341637 2.007235 219
## [13] 0.5070423 0.02887646 1.984385 144
## [14] 0.5020921 0.02430097 2.594890 120
## [15] 0.5000000 0.02582613 2.584078 127
sorted_rules2 <- sort(groceryrules, by = "lift")
inspect(sorted_rules2[1:15])
## lhs rhs support
## [1] {citrus fruit, root vegetables} => {other vegetables} 0.01037112
## [2] {root vegetables, tropical fruit} => {other vegetables} 0.01230300
## [3] {rolls/buns, root vegetables} => {other vegetables} 0.01220132
## [4] {root vegetables, yogurt} => {other vegetables} 0.01291307
## [5] {curd, yogurt} => {whole milk} 0.01006609
## [6] {butter, other vegetables} => {whole milk} 0.01148958
## [7] {root vegetables, tropical fruit} => {whole milk} 0.01199797
## [8] {root vegetables, yogurt} => {whole milk} 0.01453991
## [9] {domestic eggs, other vegetables} => {whole milk} 0.01230300
## [10] {whipped/sour cream, yogurt} => {whole milk} 0.01087951
## [11] {rolls/buns, root vegetables} => {whole milk} 0.01270971
## [12] {other vegetables, pip fruit} => {whole milk} 0.01352313
## [13] {tropical fruit, yogurt} => {whole milk} 0.01514997
## [14] {other vegetables, yogurt} => {whole milk} 0.02226741
## [15] {other vegetables, whipped/sour cream} => {whole milk} 0.01464159
## confidence coverage lift count
## [1] 0.5862069 0.01769192 3.029608 102
## [2] 0.5845411 0.02104728 3.020999 121
## [3] 0.5020921 0.02430097 2.594890 120
## [4] 0.5000000 0.02582613 2.584078 127
## [5] 0.5823529 0.01728521 2.279125 99
## [6] 0.5736041 0.02003050 2.244885 113
## [7] 0.5700483 0.02104728 2.230969 118
## [8] 0.5629921 0.02582613 2.203354 143
## [9] 0.5525114 0.02226741 2.162336 121
## [10] 0.5245098 0.02074225 2.052747 107
## [11] 0.5230126 0.02430097 2.046888 125
## [12] 0.5175097 0.02613116 2.025351 133
## [13] 0.5173611 0.02928317 2.024770 149
## [14] 0.5128806 0.04341637 2.007235 219
## [15] 0.5070423 0.02887646 1.984385 144
sorted_rules3<-sort(groceryrules, by = "support")
inspect(sorted_rules3[1:15])
## lhs rhs support
## [1] {other vegetables, yogurt} => {whole milk} 0.02226741
## [2] {tropical fruit, yogurt} => {whole milk} 0.01514997
## [3] {other vegetables, whipped/sour cream} => {whole milk} 0.01464159
## [4] {root vegetables, yogurt} => {whole milk} 0.01453991
## [5] {other vegetables, pip fruit} => {whole milk} 0.01352313
## [6] {root vegetables, yogurt} => {other vegetables} 0.01291307
## [7] {rolls/buns, root vegetables} => {whole milk} 0.01270971
## [8] {domestic eggs, other vegetables} => {whole milk} 0.01230300
## [9] {root vegetables, tropical fruit} => {other vegetables} 0.01230300
## [10] {rolls/buns, root vegetables} => {other vegetables} 0.01220132
## [11] {root vegetables, tropical fruit} => {whole milk} 0.01199797
## [12] {butter, other vegetables} => {whole milk} 0.01148958
## [13] {whipped/sour cream, yogurt} => {whole milk} 0.01087951
## [14] {citrus fruit, root vegetables} => {other vegetables} 0.01037112
## [15] {curd, yogurt} => {whole milk} 0.01006609
## confidence coverage lift count
## [1] 0.5128806 0.04341637 2.007235 219
## [2] 0.5173611 0.02928317 2.024770 149
## [3] 0.5070423 0.02887646 1.984385 144
## [4] 0.5629921 0.02582613 2.203354 143
## [5] 0.5175097 0.02613116 2.025351 133
## [6] 0.5000000 0.02582613 2.584078 127
## [7] 0.5230126 0.02430097 2.046888 125
## [8] 0.5525114 0.02226741 2.162336 121
## [9] 0.5845411 0.02104728 3.020999 121
## [10] 0.5020921 0.02430097 2.594890 120
## [11] 0.5700483 0.02104728 2.230969 118
## [12] 0.5736041 0.02003050 2.244885 113
## [13] 0.5245098 0.02074225 2.052747 107
## [14] 0.5862069 0.01769192 3.029608 102
## [15] 0.5823529 0.01728521 2.279125 99
Looking at the output we can conclude that the strongest pairs are:
curd and yogurt=>whole milk for confidence
citrus fruit, root vegetables=> other vegetables for lift
other vegetables, yogurt=> whole milk for support
plot(groceryrules, method="grouped")
This graph shows the most important rules on the very left. According to the graph, the group with the strongest rule is the one on the left. (citrus fruit,root vegetables=> other vegetables)
plot(groceryrules, method="graph")
A graph of support with relation to lift levels. We can see that the lift levels barely go below 2. As long as it’s above 1, there exists a posivite correlation between group of products
plot(groceryrules, method="paracoord", control=list(reorder=TRUE))
These graphs show the result of Apriori algorithm. From the graphs we can conclude that when buying products on the left (2) there is a higher chance based on the intensity of the arrow. For example, when buying yogurt there is a high chance of buying whole milk, or when buying tropical fruit there is a high chance of buying other vegetables.
Apriori algorithm is a great tool to use when trying to learn tendencies from a dataset, for example when buying products. In the project, the Apriori algorithm with a support of 1% and confidence of 50% was used on a dataset of market basket. From the results it was concluded that whole milk and other vegetables were the most bought after something” products.