Introduction

This document uses market basket analysis to determine the typical grouping of groceries a shopper buys when they go to the grocery store. The data used for analysis came from the groceries.csv.

library(arules)
## Warning: package 'arules' was built under R version 4.0.3
## 
## Attaching package: 'arules'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
groceries <- read.csv("groceries.csv", header=FALSE)
groceries <- read.transactions("groceries.csv", format = 'basket', sep=',')
inspect(head(groceries))
##     items                     
## [1] {citrus fruit,            
##      margarine,               
##      ready soups,             
##      semi-finished bread}     
## [2] {coffee,                  
##      tropical fruit,          
##      yogurt}                  
## [3] {whole milk}              
## [4] {cream cheese,            
##      meat spreads,            
##      pip fruit,               
##      yogurt}                  
## [5] {condensed milk,          
##      long life bakery product,
##      other vegetables,        
##      whole milk}              
## [6] {abrasive cleaner,        
##      butter,                  
##      rice,                    
##      whole milk,              
##      yogurt}
summary(groceries)
## transactions as itemMatrix in sparse format with
##  9835 rows (elements/itemsets/transactions) and
##  169 columns (items) and a density of 0.02609146 
## 
## most frequent items:
##       whole milk other vegetables       rolls/buns             soda 
##             2513             1903             1809             1715 
##           yogurt          (Other) 
##             1372            34055 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 2159 1643 1299 1005  855  645  545  438  350  246  182  117   78   77   55   46 
##   17   18   19   20   21   22   23   24   26   27   28   29   32 
##   29   14   14    9   11    4    6    1    1    1    1    3    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   4.409   6.000  32.000 
## 
## includes extended item information - examples:
##             labels
## 1 abrasive cleaner
## 2 artif. sweetener
## 3   baby cosmetics
itemFrequency(groceries, type = 'absolute') 
##          abrasive cleaner          artif. sweetener            baby cosmetics 
##                        35                        32                         6 
##                 baby food                      bags             baking powder 
##                         1                         4                       174 
##          bathroom cleaner                      beef                   berries 
##                        27                       516                       327 
##                 beverages              bottled beer             bottled water 
##                       256                       792                      1087 
##                    brandy               brown bread                    butter 
##                        41                       638                       545 
##               butter milk                  cake bar                   candles 
##                       275                       130                        88 
##                     candy               canned beer               canned fish 
##                       294                       764                       148 
##              canned fruit         canned vegetables                  cat food 
##                        32                       106                       229 
##                   cereals               chewing gum                   chicken 
##                        56                       207                       422 
##                 chocolate     chocolate marshmallow              citrus fruit 
##                       488                        89                       814 
##                   cleaner           cling film/bags              cocoa drinks 
##                        50                       112                        22 
##                    coffee            condensed milk         cooking chocolate 
##                       571                       101                        25 
##                  cookware                     cream              cream cheese 
##                        27                        13                       390 
##                      curd               curd cheese               decalcifier 
##                       524                        50                        15 
##               dental care                   dessert                 detergent 
##                        57                       365                       189 
##              dish cleaner                    dishes                  dog food 
##                       103                       173                        84 
##             domestic eggs  female sanitary products         finished products 
##                       624                        60                        64 
##                      fish                     flour            flower (seeds) 
##                        29                       171                       102 
##    flower soil/fertilizer               frankfurter            frozen chicken 
##                        19                       580                         6 
##            frozen dessert               frozen fish             frozen fruits 
##                       106                       115                        12 
##              frozen meals    frozen potato products         frozen vegetables 
##                       279                        83                       473 
##     fruit/vegetable juice                    grapes                hair spray 
##                       711                       220                        11 
##                       ham            hamburger meat               hard cheese 
##                       256                       327                       241 
##                     herbs                     honey    house keeping products 
##                       160                        15                        82 
##          hygiene articles                 ice cream            instant coffee 
##                       324                       246                        73 
##     Instant food products                       jam                   ketchup 
##                        79                        53                        42 
##            kitchen towels           kitchen utensil               light bulbs 
##                        59                         4                        41 
##                   liqueur                    liquor        liquor (appetizer) 
##                         9                       109                        78 
##                liver loaf  long life bakery product           make up remover 
##                        50                       368                         8 
##            male cosmetics                 margarine                mayonnaise 
##                        45                       576                        90 
##                      meat              meat spreads           misc. beverages 
##                       254                        42                       279 
##                   mustard                   napkins                newspapers 
##                       118                       515                       785 
##                 nut snack               nuts/prunes                       oil 
##                        31                        33                       276 
##                    onions          organic products           organic sausage 
##                       305                        16                        22 
##          other vegetables packaged fruit/vegetables                     pasta 
##                      1903                       128                       148 
##                    pastry                  pet care                photo/film 
##                       875                        93                        91 
##        pickled vegetables                 pip fruit                   popcorn 
##                       176                       744                        71 
##                      pork           potato products             potted plants 
##                       567                        28                       170 
##     preservation products          processed cheese                  prosecco 
##                         2                       163                        20 
##            pudding powder               ready soups            red/blush wine 
##                        23                        18                       189 
##                      rice             roll products                rolls/buns 
##                        75                       101                      1809 
##           root vegetables           rubbing alcohol                       rum 
##                      1072                        10                        44 
##            salad dressing                      salt               salty snack 
##                         8                       106                       372 
##                    sauces                   sausage         seasonal products 
##                        54                       924                       140 
##       semi-finished bread             shopping bags                 skin care 
##                       174                       969                        35 
##             sliced cheese            snack products                      soap 
##                       241                        30                        26 
##                      soda               soft cheese                  softener 
##                      1715                       168                        54 
##      sound storage medium                     soups            sparkling wine 
##                         1                        67                        55 
##             specialty bar          specialty cheese       specialty chocolate 
##                       269                        84                       299 
##             specialty fat      specialty vegetables                    spices 
##                        36                        17                        51 
##             spread cheese                     sugar             sweet spreads 
##                       110                       333                        89 
##                     syrup                       tea                   tidbits 
##                        32                        38                        23 
##            toilet cleaner            tropical fruit                    turkey 
##                         7                      1032                        80 
##                  UHT-milk                   vinegar                   waffles 
##                       329                        64                       378 
##        whipped/sour cream                    whisky               white bread 
##                       705                         8                       414 
##                white wine                whole milk                    yogurt 
##                       187                      2513                      1372 
##                  zwieback 
##                        68
itemFrequencyPlot(groceries,topN = 20, type = "absolute",main = "Grocery Item Frequency")

grocrules <- apriori(groceries, parameter = list(supp = 0.1, conf = 0.1))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.1    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 983 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [8 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [8 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
grocrules<- sort(grocrules, by = "support", decreasing = TRUE)
inspect(grocrules)
##     lhs    rhs                support   confidence coverage lift count
## [1] {}  => {whole milk}       0.2555160 0.2555160  1        1    2513 
## [2] {}  => {other vegetables} 0.1934926 0.1934926  1        1    1903 
## [3] {}  => {rolls/buns}       0.1839349 0.1839349  1        1    1809 
## [4] {}  => {soda}             0.1743772 0.1743772  1        1    1715 
## [5] {}  => {yogurt}           0.1395018 0.1395018  1        1    1372 
## [6] {}  => {bottled water}    0.1105236 0.1105236  1        1    1087 
## [7] {}  => {root vegetables}  0.1089985 0.1089985  1        1    1072 
## [8] {}  => {tropical fruit}   0.1049314 0.1049314  1        1    1032
grocrules<- sort(grocrules, by = "confidence", decreasing = TRUE)
inspect(grocrules)
##     lhs    rhs                support   confidence coverage lift count
## [1] {}  => {whole milk}       0.2555160 0.2555160  1        1    2513 
## [2] {}  => {other vegetables} 0.1934926 0.1934926  1        1    1903 
## [3] {}  => {rolls/buns}       0.1839349 0.1839349  1        1    1809 
## [4] {}  => {soda}             0.1743772 0.1743772  1        1    1715 
## [5] {}  => {yogurt}           0.1395018 0.1395018  1        1    1372 
## [6] {}  => {bottled water}    0.1105236 0.1105236  1        1    1087 
## [7] {}  => {root vegetables}  0.1089985 0.1089985  1        1    1072 
## [8] {}  => {tropical fruit}   0.1049314 0.1049314  1        1    1032
grocrules<- sort(grocrules, by = "coverage", decreasing = TRUE)
inspect(grocrules)
##     lhs    rhs                support   confidence coverage lift count
## [1] {}  => {whole milk}       0.2555160 0.2555160  1        1    2513 
## [2] {}  => {other vegetables} 0.1934926 0.1934926  1        1    1903 
## [3] {}  => {rolls/buns}       0.1839349 0.1839349  1        1    1809 
## [4] {}  => {soda}             0.1743772 0.1743772  1        1    1715 
## [5] {}  => {yogurt}           0.1395018 0.1395018  1        1    1372 
## [6] {}  => {bottled water}    0.1105236 0.1105236  1        1    1087 
## [7] {}  => {root vegetables}  0.1089985 0.1089985  1        1    1072 
## [8] {}  => {tropical fruit}   0.1049314 0.1049314  1        1    1032
grocrules<- sort(grocrules, by = "lift", decreasing = TRUE)
inspect(grocrules)
##     lhs    rhs                support   confidence coverage lift count
## [1] {}  => {rolls/buns}       0.1839349 0.1839349  1        1    1809 
## [2] {}  => {yogurt}           0.1395018 0.1395018  1        1    1372 
## [3] {}  => {whole milk}       0.2555160 0.2555160  1        1    2513 
## [4] {}  => {other vegetables} 0.1934926 0.1934926  1        1    1903 
## [5] {}  => {soda}             0.1743772 0.1743772  1        1    1715 
## [6] {}  => {bottled water}    0.1105236 0.1105236  1        1    1087 
## [7] {}  => {root vegetables}  0.1089985 0.1089985  1        1    1072 
## [8] {}  => {tropical fruit}   0.1049314 0.1049314  1        1    1032
grocrules<- sort(grocrules, by = "count", decreasing = TRUE)
inspect(grocrules)
##     lhs    rhs                support   confidence coverage lift count
## [1] {}  => {whole milk}       0.2555160 0.2555160  1        1    2513 
## [2] {}  => {other vegetables} 0.1934926 0.1934926  1        1    1903 
## [3] {}  => {rolls/buns}       0.1839349 0.1839349  1        1    1809 
## [4] {}  => {soda}             0.1743772 0.1743772  1        1    1715 
## [5] {}  => {yogurt}           0.1395018 0.1395018  1        1    1372 
## [6] {}  => {bottled water}    0.1105236 0.1105236  1        1    1087 
## [7] {}  => {root vegetables}  0.1089985 0.1089985  1        1    1072 
## [8] {}  => {tropical fruit}   0.1049314 0.1049314  1        1    1032
rules.mw<-apriori(data=groceries, parameter = list(supp=0.001,conf=0.08),
                  appearance = list(default="lhs", rhs = "whole milk"), control = list(verbose = F))
rules.mw.byconf<- sort(rules.mw, by = "confidence", decreasing = TRUE)
inspect(head(rules.mw.byconf))
##     lhs                     rhs              support confidence    coverage     lift count
## [1] {rice,                                                                                
##      sugar}              => {whole milk} 0.001220132          1 0.001220132 3.913649    12
## [2] {canned fish,                                                                         
##      hygiene articles}   => {whole milk} 0.001118454          1 0.001118454 3.913649    11
## [3] {butter,                                                                              
##      rice,                                                                                
##      root vegetables}    => {whole milk} 0.001016777          1 0.001016777 3.913649    10
## [4] {flour,                                                                               
##      root vegetables,                                                                     
##      whipped/sour cream} => {whole milk} 0.001728521          1 0.001728521 3.913649    17
## [5] {butter,                                                                              
##      domestic eggs,                                                                       
##      soft cheese}        => {whole milk} 0.001016777          1 0.001016777 3.913649    10
## [6] {butter,                                                                              
##      hygiene articles,                                                                    
##      pip fruit}          => {whole milk} 0.001016777          1 0.001016777 3.913649    10
grocrules.mw<-apriori(data=groceries, parameter=list(supp=0.001,conf = 0.08), 
                  appearance=list(default="rhs",lhs="whole milk"), control=list(verbose=F)) 
grocrules.mw.byconf<-sort(rules.mw, by="support", decreasing=FALSE)
inspect(head(grocrules.mw.byconf))
##     lhs                           rhs          support     confidence
## [1] {sparkling wine}           => {whole milk} 0.001016777 0.1818182 
## [2] {liver loaf,yogurt}        => {whole milk} 0.001016777 0.6666667 
## [3] {curd cheese,rolls/buns}   => {whole milk} 0.001016777 0.6250000 
## [4] {cleaner,other vegetables} => {whole milk} 0.001016777 0.6250000 
## [5] {cereals,curd}             => {whole milk} 0.001016777 0.9090909 
## [6] {cereals,root vegetables}  => {whole milk} 0.001016777 0.7692308 
##     coverage    lift      count
## [1] 0.005592272 0.7115726 10   
## [2] 0.001525165 2.6090994 10   
## [3] 0.001626843 2.4460306 10   
## [4] 0.001626843 2.4460306 10   
## [5] 0.001118454 3.5578628 10   
## [6] 0.001321810 3.0104993 10

Conclusion

Upon analysis, it was shown that there was total of 9,835 shoppers and that they had a selection of 169 items. In particular, it was shown that a majority of shoppers bought whole milk when they went to the grocery store. It was also shown that shoppers were less likely to buy items such as baby food, kitchen utensils,and preservation products.This was shown in the item frequency list, which showed that these items were being bought by a shopper less frequently. It was also shown that a total of 8 rules was used to determine support, confidence,coverage, lift, and count. It was shown that the LHS and RHS order for support, confidence, coverage, and count was identical, but was different for lift.