Piotr Bugajski

Introduction:

The aim of the project is to perform association rules technique called Apriori algorithm to see buying tendencies of customers. In the project the usage of association rules determinants like support (how many times a product appears in the data set), lift (which determiens the relationship between products, above 1 means positive correlation) and confidence(which is basically probability)

library(arules)
## Warning: package 'arules' was built under R version 4.3.2
## Loading required package: Matrix
## Warning: package 'Matrix' was built under R version 4.3.2
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
library(arulesViz)
## Warning: package 'arulesViz' was built under R version 4.3.2
library(arulesCBA)
## Warning: package 'arulesCBA' was built under R version 4.3.2
## [1] "C:/Users/Piotr/Downloads"
Gro<-read.transactions("market basket groceries/groceries.csv",format="basket",sep=",")

1. Data preparation

Let’s see how the dataset looks

summary(Gro)
## transactions as itemMatrix in sparse format with
##  9835 rows (elements/itemsets/transactions) and
##  169 columns (items) and a density of 0.02609146 
## 
## most frequent items:
##       whole milk other vegetables       rolls/buns             soda 
##             2513             1903             1809             1715 
##           yogurt          (Other) 
##             1372            34055 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 2159 1643 1299 1005  855  645  545  438  350  246  182  117   78   77   55   46 
##   17   18   19   20   21   22   23   24   26   27   28   29   32 
##   29   14   14    9   11    4    6    1    1    1    1    3    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   4.409   6.000  32.000 
## 
## includes extended item information - examples:
##             labels
## 1 abrasive cleaner
## 2 artif. sweetener
## 3   baby cosmetics

We have 9835 rows with 169 columns

print(sum(size(Gro)))
## [1] 43367

For a total of 43367 products

Statistics about the dataset

itemFrequency(Gro,type="absolute")
##          abrasive cleaner          artif. sweetener            baby cosmetics 
##                        35                        32                         6 
##                 baby food                      bags             baking powder 
##                         1                         4                       174 
##          bathroom cleaner                      beef                   berries 
##                        27                       516                       327 
##                 beverages              bottled beer             bottled water 
##                       256                       792                      1087 
##                    brandy               brown bread                    butter 
##                        41                       638                       545 
##               butter milk                  cake bar                   candles 
##                       275                       130                        88 
##                     candy               canned beer               canned fish 
##                       294                       764                       148 
##              canned fruit         canned vegetables                  cat food 
##                        32                       106                       229 
##                   cereals               chewing gum                   chicken 
##                        56                       207                       422 
##                 chocolate     chocolate marshmallow              citrus fruit 
##                       488                        89                       814 
##                   cleaner           cling film/bags              cocoa drinks 
##                        50                       112                        22 
##                    coffee            condensed milk         cooking chocolate 
##                       571                       101                        25 
##                  cookware                     cream              cream cheese 
##                        27                        13                       390 
##                      curd               curd cheese               decalcifier 
##                       524                        50                        15 
##               dental care                   dessert                 detergent 
##                        57                       365                       189 
##              dish cleaner                    dishes                  dog food 
##                       103                       173                        84 
##             domestic eggs  female sanitary products         finished products 
##                       624                        60                        64 
##                      fish                     flour            flower (seeds) 
##                        29                       171                       102 
##    flower soil/fertilizer               frankfurter            frozen chicken 
##                        19                       580                         6 
##            frozen dessert               frozen fish             frozen fruits 
##                       106                       115                        12 
##              frozen meals    frozen potato products         frozen vegetables 
##                       279                        83                       473 
##     fruit/vegetable juice                    grapes                hair spray 
##                       711                       220                        11 
##                       ham            hamburger meat               hard cheese 
##                       256                       327                       241 
##                     herbs                     honey    house keeping products 
##                       160                        15                        82 
##          hygiene articles                 ice cream            instant coffee 
##                       324                       246                        73 
##     Instant food products                       jam                   ketchup 
##                        79                        53                        42 
##            kitchen towels           kitchen utensil               light bulbs 
##                        59                         4                        41 
##                   liqueur                    liquor        liquor (appetizer) 
##                         9                       109                        78 
##                liver loaf  long life bakery product           make up remover 
##                        50                       368                         8 
##            male cosmetics                 margarine                mayonnaise 
##                        45                       576                        90 
##                      meat              meat spreads           misc. beverages 
##                       254                        42                       279 
##                   mustard                   napkins                newspapers 
##                       118                       515                       785 
##                 nut snack               nuts/prunes                       oil 
##                        31                        33                       276 
##                    onions          organic products           organic sausage 
##                       305                        16                        22 
##          other vegetables packaged fruit/vegetables                     pasta 
##                      1903                       128                       148 
##                    pastry                  pet care                photo/film 
##                       875                        93                        91 
##        pickled vegetables                 pip fruit                   popcorn 
##                       176                       744                        71 
##                      pork           potato products             potted plants 
##                       567                        28                       170 
##     preservation products          processed cheese                  prosecco 
##                         2                       163                        20 
##            pudding powder               ready soups            red/blush wine 
##                        23                        18                       189 
##                      rice             roll products                rolls/buns 
##                        75                       101                      1809 
##           root vegetables           rubbing alcohol                       rum 
##                      1072                        10                        44 
##            salad dressing                      salt               salty snack 
##                         8                       106                       372 
##                    sauces                   sausage         seasonal products 
##                        54                       924                       140 
##       semi-finished bread             shopping bags                 skin care 
##                       174                       969                        35 
##             sliced cheese            snack products                      soap 
##                       241                        30                        26 
##                      soda               soft cheese                  softener 
##                      1715                       168                        54 
##      sound storage medium                     soups            sparkling wine 
##                         1                        67                        55 
##             specialty bar          specialty cheese       specialty chocolate 
##                       269                        84                       299 
##             specialty fat      specialty vegetables                    spices 
##                        36                        17                        51 
##             spread cheese                     sugar             sweet spreads 
##                       110                       333                        89 
##                     syrup                       tea                   tidbits 
##                        32                        38                        23 
##            toilet cleaner            tropical fruit                    turkey 
##                         7                      1032                        80 
##                  UHT-milk                   vinegar                   waffles 
##                       329                        64                       378 
##        whipped/sour cream                    whisky               white bread 
##                       705                         8                       414 
##                white wine                whole milk                    yogurt 
##                       187                      2513                      1372 
##                  zwieback 
##                        68
itemFrequency(Gro,type="relative")
##          abrasive cleaner          artif. sweetener            baby cosmetics 
##              0.0035587189              0.0032536858              0.0006100661 
##                 baby food                      bags             baking powder 
##              0.0001016777              0.0004067107              0.0176919166 
##          bathroom cleaner                      beef                   berries 
##              0.0027452974              0.0524656838              0.0332486019 
##                 beverages              bottled beer             bottled water 
##              0.0260294865              0.0805287239              0.1105236401 
##                    brandy               brown bread                    butter 
##              0.0041687850              0.0648703610              0.0554143366 
##               butter milk                  cake bar                   candles 
##              0.0279613625              0.0132180986              0.0089476360 
##                     candy               canned beer               canned fish 
##              0.0298932384              0.0776817489              0.0150482969 
##              canned fruit         canned vegetables                  cat food 
##              0.0032536858              0.0107778343              0.0232841891 
##                   cereals               chewing gum                   chicken 
##              0.0056939502              0.0210472801              0.0429079817 
##                 chocolate     chocolate marshmallow              citrus fruit 
##              0.0496187087              0.0090493137              0.0827656329 
##                   cleaner           cling film/bags              cocoa drinks 
##              0.0050838841              0.0113879004              0.0022369090 
##                    coffee            condensed milk         cooking chocolate 
##              0.0580579563              0.0102694459              0.0025419420 
##                  cookware                     cream              cream cheese 
##              0.0027452974              0.0013218099              0.0396542959 
##                      curd               curd cheese               decalcifier 
##              0.0532791052              0.0050838841              0.0015251652 
##               dental care                   dessert                 detergent 
##              0.0057956279              0.0371123538              0.0192170819 
##              dish cleaner                    dishes                  dog food 
##              0.0104728012              0.0175902389              0.0085409253 
##             domestic eggs  female sanitary products         finished products 
##              0.0634468734              0.0061006609              0.0065073716 
##                      fish                     flour            flower (seeds) 
##              0.0029486528              0.0173868836              0.0103711235 
##    flower soil/fertilizer               frankfurter            frozen chicken 
##              0.0019318760              0.0589730554              0.0006100661 
##            frozen dessert               frozen fish             frozen fruits 
##              0.0107778343              0.0116929334              0.0012201322 
##              frozen meals    frozen potato products         frozen vegetables 
##              0.0283680732              0.0084392476              0.0480935435 
##     fruit/vegetable juice                    grapes                hair spray 
##              0.0722928317              0.0223690900              0.0011184545 
##                       ham            hamburger meat               hard cheese 
##              0.0260294865              0.0332486019              0.0245043213 
##                     herbs                     honey    house keeping products 
##              0.0162684291              0.0015251652              0.0083375699 
##          hygiene articles                 ice cream            instant coffee 
##              0.0329435689              0.0250127097              0.0074224708 
##     Instant food products                       jam                   ketchup 
##              0.0080325369              0.0053889171              0.0042704626 
##            kitchen towels           kitchen utensil               light bulbs 
##              0.0059989832              0.0004067107              0.0041687850 
##                   liqueur                    liquor        liquor (appetizer) 
##              0.0009150991              0.0110828673              0.0079308592 
##                liver loaf  long life bakery product           make up remover 
##              0.0050838841              0.0374173869              0.0008134215 
##            male cosmetics                 margarine                mayonnaise 
##              0.0045754957              0.0585663447              0.0091509914 
##                      meat              meat spreads           misc. beverages 
##              0.0258261312              0.0042704626              0.0283680732 
##                   mustard                   napkins                newspapers 
##              0.0119979664              0.0523640061              0.0798169802 
##                 nut snack               nuts/prunes                       oil 
##              0.0031520081              0.0033553635              0.0280630402 
##                    onions          organic products           organic sausage 
##              0.0310116929              0.0016268429              0.0022369090 
##          other vegetables packaged fruit/vegetables                     pasta 
##              0.1934926284              0.0130147433              0.0150482969 
##                    pastry                  pet care                photo/film 
##              0.0889679715              0.0094560244              0.0092526690 
##        pickled vegetables                 pip fruit                   popcorn 
##              0.0178952720              0.0756481952              0.0072191154 
##                      pork           potato products             potted plants 
##              0.0576512456              0.0028469751              0.0172852059 
##     preservation products          processed cheese                  prosecco 
##              0.0002033554              0.0165734621              0.0020335536 
##            pudding powder               ready soups            red/blush wine 
##              0.0023385867              0.0018301983              0.0192170819 
##                      rice             roll products                rolls/buns 
##              0.0076258261              0.0102694459              0.1839349263 
##           root vegetables           rubbing alcohol                       rum 
##              0.1089984748              0.0010167768              0.0044738180 
##            salad dressing                      salt               salty snack 
##              0.0008134215              0.0107778343              0.0378240976 
##                    sauces                   sausage         seasonal products 
##              0.0054905948              0.0939501779              0.0142348754 
##       semi-finished bread             shopping bags                 skin care 
##              0.0176919166              0.0985256736              0.0035587189 
##             sliced cheese            snack products                      soap 
##              0.0245043213              0.0030503305              0.0026436197 
##                      soda               soft cheese                  softener 
##              0.1743772242              0.0170818505              0.0054905948 
##      sound storage medium                     soups            sparkling wine 
##              0.0001016777              0.0068124047              0.0055922725 
##             specialty bar          specialty cheese       specialty chocolate 
##              0.0273512964              0.0085409253              0.0304016268 
##             specialty fat      specialty vegetables                    spices 
##              0.0036603965              0.0017285206              0.0051855618 
##             spread cheese                     sugar             sweet spreads 
##              0.0111845450              0.0338586680              0.0090493137 
##                     syrup                       tea                   tidbits 
##              0.0032536858              0.0038637519              0.0023385867 
##            toilet cleaner            tropical fruit                    turkey 
##              0.0007117438              0.1049313676              0.0081342145 
##                  UHT-milk                   vinegar                   waffles 
##              0.0334519573              0.0065073716              0.0384341637 
##        whipped/sour cream                    whisky               white bread 
##              0.0716827656              0.0008134215              0.0420945602 
##                white wine                whole milk                    yogurt 
##              0.0190137265              0.2555160142              0.1395017794 
##                  zwieback 
##              0.0069140824

What is interesting to see from the output is that whole milk is responsible for about 25% of the purchases

itemFrequencyPlot(Gro, topN=15, type="relative")

itemFrequencyPlot(Gro, topN=15, type="absolute")

As mentioned before, the most common bought product is whole milk, followed by other vegetables, rolls/buns, then soda and yogurt - these consist of top 5 in terms of frequency.

We can move forward to our algorithm - we will use the apriori method

2. Apriori algorithm

We will set support to 1% and confidence to 50%

groceryrules <- apriori(Gro, parameter = list(support = 0.01, confidence = 0.5, minlen = 2))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5    0.01      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 98 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [15 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
groceryrules
## set of 15 rules

From performing the Apriori algorithm we got 15 rules, which we will now inspect and draw conclusions from

inspect(groceryrules[1:15])
##      lhs                                       rhs                support   
## [1]  {curd, yogurt}                         => {whole milk}       0.01006609
## [2]  {butter, other vegetables}             => {whole milk}       0.01148958
## [3]  {domestic eggs, other vegetables}      => {whole milk}       0.01230300
## [4]  {whipped/sour cream, yogurt}           => {whole milk}       0.01087951
## [5]  {other vegetables, whipped/sour cream} => {whole milk}       0.01464159
## [6]  {other vegetables, pip fruit}          => {whole milk}       0.01352313
## [7]  {citrus fruit, root vegetables}        => {other vegetables} 0.01037112
## [8]  {root vegetables, tropical fruit}      => {other vegetables} 0.01230300
## [9]  {root vegetables, tropical fruit}      => {whole milk}       0.01199797
## [10] {tropical fruit, yogurt}               => {whole milk}       0.01514997
## [11] {root vegetables, yogurt}              => {other vegetables} 0.01291307
## [12] {root vegetables, yogurt}              => {whole milk}       0.01453991
## [13] {rolls/buns, root vegetables}          => {other vegetables} 0.01220132
## [14] {rolls/buns, root vegetables}          => {whole milk}       0.01270971
## [15] {other vegetables, yogurt}             => {whole milk}       0.02226741
##      confidence coverage   lift     count
## [1]  0.5823529  0.01728521 2.279125  99  
## [2]  0.5736041  0.02003050 2.244885 113  
## [3]  0.5525114  0.02226741 2.162336 121  
## [4]  0.5245098  0.02074225 2.052747 107  
## [5]  0.5070423  0.02887646 1.984385 144  
## [6]  0.5175097  0.02613116 2.025351 133  
## [7]  0.5862069  0.01769192 3.029608 102  
## [8]  0.5845411  0.02104728 3.020999 121  
## [9]  0.5700483  0.02104728 2.230969 118  
## [10] 0.5173611  0.02928317 2.024770 149  
## [11] 0.5000000  0.02582613 2.584078 127  
## [12] 0.5629921  0.02582613 2.203354 143  
## [13] 0.5020921  0.02430097 2.594890 120  
## [14] 0.5230126  0.02430097 2.046888 125  
## [15] 0.5128806  0.04341637 2.007235 219

Let’s reorder the rules we got by confidence, lift and support to see the most important purchases

sorted_rules1 <- sort(groceryrules, by = "confidence")
inspect(sorted_rules1[1:15])
##      lhs                                       rhs                support   
## [1]  {citrus fruit, root vegetables}        => {other vegetables} 0.01037112
## [2]  {root vegetables, tropical fruit}      => {other vegetables} 0.01230300
## [3]  {curd, yogurt}                         => {whole milk}       0.01006609
## [4]  {butter, other vegetables}             => {whole milk}       0.01148958
## [5]  {root vegetables, tropical fruit}      => {whole milk}       0.01199797
## [6]  {root vegetables, yogurt}              => {whole milk}       0.01453991
## [7]  {domestic eggs, other vegetables}      => {whole milk}       0.01230300
## [8]  {whipped/sour cream, yogurt}           => {whole milk}       0.01087951
## [9]  {rolls/buns, root vegetables}          => {whole milk}       0.01270971
## [10] {other vegetables, pip fruit}          => {whole milk}       0.01352313
## [11] {tropical fruit, yogurt}               => {whole milk}       0.01514997
## [12] {other vegetables, yogurt}             => {whole milk}       0.02226741
## [13] {other vegetables, whipped/sour cream} => {whole milk}       0.01464159
## [14] {rolls/buns, root vegetables}          => {other vegetables} 0.01220132
## [15] {root vegetables, yogurt}              => {other vegetables} 0.01291307
##      confidence coverage   lift     count
## [1]  0.5862069  0.01769192 3.029608 102  
## [2]  0.5845411  0.02104728 3.020999 121  
## [3]  0.5823529  0.01728521 2.279125  99  
## [4]  0.5736041  0.02003050 2.244885 113  
## [5]  0.5700483  0.02104728 2.230969 118  
## [6]  0.5629921  0.02582613 2.203354 143  
## [7]  0.5525114  0.02226741 2.162336 121  
## [8]  0.5245098  0.02074225 2.052747 107  
## [9]  0.5230126  0.02430097 2.046888 125  
## [10] 0.5175097  0.02613116 2.025351 133  
## [11] 0.5173611  0.02928317 2.024770 149  
## [12] 0.5128806  0.04341637 2.007235 219  
## [13] 0.5070423  0.02887646 1.984385 144  
## [14] 0.5020921  0.02430097 2.594890 120  
## [15] 0.5000000  0.02582613 2.584078 127
sorted_rules2 <- sort(groceryrules, by = "lift")
inspect(sorted_rules2[1:15])
##      lhs                                       rhs                support   
## [1]  {citrus fruit, root vegetables}        => {other vegetables} 0.01037112
## [2]  {root vegetables, tropical fruit}      => {other vegetables} 0.01230300
## [3]  {rolls/buns, root vegetables}          => {other vegetables} 0.01220132
## [4]  {root vegetables, yogurt}              => {other vegetables} 0.01291307
## [5]  {curd, yogurt}                         => {whole milk}       0.01006609
## [6]  {butter, other vegetables}             => {whole milk}       0.01148958
## [7]  {root vegetables, tropical fruit}      => {whole milk}       0.01199797
## [8]  {root vegetables, yogurt}              => {whole milk}       0.01453991
## [9]  {domestic eggs, other vegetables}      => {whole milk}       0.01230300
## [10] {whipped/sour cream, yogurt}           => {whole milk}       0.01087951
## [11] {rolls/buns, root vegetables}          => {whole milk}       0.01270971
## [12] {other vegetables, pip fruit}          => {whole milk}       0.01352313
## [13] {tropical fruit, yogurt}               => {whole milk}       0.01514997
## [14] {other vegetables, yogurt}             => {whole milk}       0.02226741
## [15] {other vegetables, whipped/sour cream} => {whole milk}       0.01464159
##      confidence coverage   lift     count
## [1]  0.5862069  0.01769192 3.029608 102  
## [2]  0.5845411  0.02104728 3.020999 121  
## [3]  0.5020921  0.02430097 2.594890 120  
## [4]  0.5000000  0.02582613 2.584078 127  
## [5]  0.5823529  0.01728521 2.279125  99  
## [6]  0.5736041  0.02003050 2.244885 113  
## [7]  0.5700483  0.02104728 2.230969 118  
## [8]  0.5629921  0.02582613 2.203354 143  
## [9]  0.5525114  0.02226741 2.162336 121  
## [10] 0.5245098  0.02074225 2.052747 107  
## [11] 0.5230126  0.02430097 2.046888 125  
## [12] 0.5175097  0.02613116 2.025351 133  
## [13] 0.5173611  0.02928317 2.024770 149  
## [14] 0.5128806  0.04341637 2.007235 219  
## [15] 0.5070423  0.02887646 1.984385 144
sorted_rules3<-sort(groceryrules, by = "support")
inspect(sorted_rules3[1:15])
##      lhs                                       rhs                support   
## [1]  {other vegetables, yogurt}             => {whole milk}       0.02226741
## [2]  {tropical fruit, yogurt}               => {whole milk}       0.01514997
## [3]  {other vegetables, whipped/sour cream} => {whole milk}       0.01464159
## [4]  {root vegetables, yogurt}              => {whole milk}       0.01453991
## [5]  {other vegetables, pip fruit}          => {whole milk}       0.01352313
## [6]  {root vegetables, yogurt}              => {other vegetables} 0.01291307
## [7]  {rolls/buns, root vegetables}          => {whole milk}       0.01270971
## [8]  {domestic eggs, other vegetables}      => {whole milk}       0.01230300
## [9]  {root vegetables, tropical fruit}      => {other vegetables} 0.01230300
## [10] {rolls/buns, root vegetables}          => {other vegetables} 0.01220132
## [11] {root vegetables, tropical fruit}      => {whole milk}       0.01199797
## [12] {butter, other vegetables}             => {whole milk}       0.01148958
## [13] {whipped/sour cream, yogurt}           => {whole milk}       0.01087951
## [14] {citrus fruit, root vegetables}        => {other vegetables} 0.01037112
## [15] {curd, yogurt}                         => {whole milk}       0.01006609
##      confidence coverage   lift     count
## [1]  0.5128806  0.04341637 2.007235 219  
## [2]  0.5173611  0.02928317 2.024770 149  
## [3]  0.5070423  0.02887646 1.984385 144  
## [4]  0.5629921  0.02582613 2.203354 143  
## [5]  0.5175097  0.02613116 2.025351 133  
## [6]  0.5000000  0.02582613 2.584078 127  
## [7]  0.5230126  0.02430097 2.046888 125  
## [8]  0.5525114  0.02226741 2.162336 121  
## [9]  0.5845411  0.02104728 3.020999 121  
## [10] 0.5020921  0.02430097 2.594890 120  
## [11] 0.5700483  0.02104728 2.230969 118  
## [12] 0.5736041  0.02003050 2.244885 113  
## [13] 0.5245098  0.02074225 2.052747 107  
## [14] 0.5862069  0.01769192 3.029608 102  
## [15] 0.5823529  0.01728521 2.279125  99

Looking at the output we can conclude that the strongest pairs are:

curd and yogurt=>whole milk for confidence

citrus fruit, root vegetables=> other vegetables for lift

other vegetables, yogurt=> whole milk for support

3.Graphs

plot(groceryrules, method="grouped")

This graph shows the most important rules on the very left. According to the graph, the group with the strongest rule is the one on the left. (citrus fruit,root vegetables=> other vegetables)

plot(groceryrules, method="graph")

A graph of support with relation to lift levels. We can see that the lift levels barely go below 2. As long as it’s above 1, there exists a posivite correlation between group of products

plot(groceryrules, method="paracoord", control=list(reorder=TRUE))

These graphs show the result of Apriori algorithm. From the graphs we can conclude that when buying products on the left (2) there is a higher chance based on the intensity of the arrow. For example, when buying yogurt there is a high chance of buying whole milk, or when buying tropical fruit there is a high chance of buying other vegetables.

4. Conclusions

Apriori algorithm is a great tool to use when trying to learn tendencies from a dataset, for example when buying products. In the project, the Apriori algorithm with a support of 1% and confidence of 50% was used on a dataset of market basket. From the results it was concluded that whole milk and other vegetables were the most bought after something” products.