1.Introduction

Companies do what they can if they want to survive or to win with others on their fields. On the retail market in large part it means becoming more and more attractive for the customers and also earning as much as possible from every single person that enters the shop. It can’t be done without a deep knowledge about customer’s behaviour, also or maybe especially the one that even the customers themselves are unaware of.

This project was inspired by the seventh chapter of the book “The power of habit”, written by Charles Duhigg. The book is available online here. For many years the retailers have used differrent psychological tricks directly related to the decisions that customers make in a store. Among others, often fruits and vegetables are to be placed close to the entry as the people who already have healthy food in their shopping basket are supposed to be more likely to buy some snacks and unhealthy food. In fact, if you go to any shop it is very likely that you’ll see that kind of order. The author focuses in his book on the Target Corporation and its ways of going one or even two steps further with making personalised adds and coupons based on many individual details instead of on behavior of the majority so that the offer can be precisely adapted to every customer separately. Nowadays personalised advertising is indeed powerful and present almost everywhere. However, at least until stationary shops exist and not the whole retail market is taken over by e-shopping, the problem of the products arrangment and it may still be a key to optimize the revenues. Also, extremely important is to attract not those customers that already are regular clients of the specific shop but the ones that came there for the first time or buy things in many places and aren’t convinced about the advantage of the one shop over another. Overall those people are the ones that the company knows very little about and doesn’t even have the way to effectively direct to them the offer that will be somehow personalised.

The world is changing and so are the people’s choices and eating habits. On one hand more and more people are obese, on the other hand healthy lifestyle and special diets are probably more popular than ever so it may be that vegetables and snacks don’t end in the same basket as often as before. It is possible, that the rules which are known and used for many years aren’t the most important ones from the perspective of company or even are existing no more. Fortunately, with the large amounts of data gathered these days it is really easy to investigate present customers behavior and compare it with the rules that are known and used for many years. The market basket analysis is a technique that may help to deal with that.

2.Libraries

library(arules)
library(arulesViz)

3.Database

A database used in this analysis was downloaded from kaggle.com and can be found here. The database contains 7501 transactions from the grocery stores and 119 different products. Based on that data the market basket analysis will be performed with the focus on the food generally considered as healthy or unhealthy.

trans<-read.transactions("Market_Basket_Optimisation.csv", format="basket", sep=",", skip=0)
trans
## transactions in sparse format with
##  7501 transactions (rows) and
##  119 items (columns)
LIST(head(trans))
## [[1]]
##  [1] "almonds"           "antioxydant juice" "avocado"          
##  [4] "cottage cheese"    "energy drink"      "frozen smoothie"  
##  [7] "green grapes"      "green tea"         "honey"            
## [10] "low fat yogurt"    "mineral water"     "olive oil"        
## [13] "salad"             "salmon"            "shrimp"           
## [16] "spinach"           "tomato juice"      "vegetables mix"   
## [19] "whole weat flour"  "yams"             
## 
## [[2]]
## [1] "burgers"   "eggs"      "meatballs"
## 
## [[3]]
## [1] "chutney"
## 
## [[4]]
## [1] "avocado" "turkey" 
## 
## [[5]]
## [1] "energy bar"       "green tea"        "milk"            
## [4] "mineral water"    "whole wheat rice"
## 
## [[6]]
## [1] "low fat yogurt"
sort(itemFrequency(trans, type="absolute"))
##          water spray              napkins                cream 
##                    3                    5                    7 
##              bramble                  tea              chutney 
##                   14                   29                   31 
##        mashed potato      chocolate bread         dessert wine 
##                   31                   32                   33 
##              ketchup              oatmeal          babies food 
##                   33                   33                   34 
##             sandwich            asparagus          cauliflower 
##                   34                   36                   36 
##                 corn                salad              shampoo 
##                   36                   37                   37 
##     hand protein bar       mint green tea         burger sauce 
##                   39                   42                   44 
##              pickles                chili           mayonnaise 
##                   45                   46                   46 
##                 soda      sparkling water             pet food 
##                   47                   47                   49 
##      gluten free bar              spinach              shallot 
##                   52                   53                   58 
##        strong cheese           toothpaste  clothes accessories 
##                   58                   61                   63 
##                bacon            bug spray          green beans 
##                   65                   65                   65 
##    antioxydant juice            flax seed         green grapes 
##                   67                   68                   68 
##          blueberries                 salt     whole weat flour 
##                   69                   69                   70 
##             zucchini           candy bars          nonfat milk 
##                   71                   73                   78 
##                cider       barbecue sauce            magazines 
##                   79                   81                   82 
##           body spray                 yams extra dark chocolate 
##                   86                   86                   90 
##               melons             eggplant                 gums 
##                   90                   99                  101 
##        fromage blanc         tomato sauce            black tea 
##                  102                  106                  107 
##              carrots          light cream                pasta 
##                  115                  117                  118 
##           white wine                 mint          protein bar 
##                  124                  131                  139 
##                 rice mushroom cream sauce      parmesan cheese 
##                  141                  143                  149 
##              almonds            meatballs         strawberries 
##                  153                  157                  160 
##           fresh tuna          french wine                  oil 
##                  167                  169                  173 
##              muffins              cereals       vegetables mix 
##                  181                  193                  193 
##                  ham               pepper         energy drink 
##                  199                  199                  200 
##           energy bar           light mayo          yogurt cake 
##                  203                  204                  205 
##             red wine    whole wheat pasta               butter 
##                  211                  221                  226 
##         tomato juice       cottage cheese             hot dogs 
##                  228                  239                  243 
##              avocado             brownies               salmon 
##                  250                  253                  319 
##          fresh bread            champagne                honey 
##                  323                  351                  356 
##        herb & pepper                 soup          cooking oil 
##                  371                  379                  383 
##        grated cheese     whole wheat rice              chicken 
##                  393                  439                  450 
##               turkey      frozen smoothie            olive oil 
##                  469                  475                  494 
##             tomatoes               shrimp       low fat yogurt 
##                  513                  536                  574 
##             escalope              cookies                 cake 
##                  595                  603                  608 
##              burgers             pancakes    frozen vegetables 
##                  654                  713                  715 
##          ground beef                 milk            green tea 
##                  737                  972                  991 
##            chocolate         french fries            spaghetti 
##                 1229                 1282                 1306 
##                 eggs        mineral water 
##                 1348                 1788

4.Association Rules

4.1.Grouping products

The database consists of over 100 different items, however there must be some order of product placement in stores, logical for the customers so they don’t get upset that they can’t find anything. Because of that these products will be grouped into 19 groups.

names.real<-c("almonds", "antioxydant juice", "asparagus", "avocado", "other", "bacon", "barbecue sauce", "black tea", "blueberries", "body spray", "bramble", "brownies", "bug spray", "burger sauce", "burgers", "butter", "cake", "candy bars", "carrots", "cauliflower", "cereals", "champagne", "chicken", "chili", "chocolate", "chocolate bread", "chutney", "cider", "clothes accessories" ,"cookies", "cooking oil", "corn", "cottage cheese", "cream", "dessert wine", "eggplant", "eggs", "energy bar", "energy drink", "escalope", "extra dark chocolate", "flax seed", "french fries", "french wine", "fresh bread", "fresh tuna", "fromage blanc", "frozen smoothie", "frozen vegetables", "gluten free bar", "grated cheese", "green beans", "green grapes", "green tea", "ground beef", "gums", "ham", "hand protein bar", "herb & pepper", "honey", "hot dogs", "ketchup", "light cream", "light mayo", "low fat yogurt", "magazines", "mashed potato", "mayonnaise", "meatballs", "melons", "milk", "mineral water", "mint", "mint green tea", "muffins", "mushroom cream sauce", "napkins", "nonfat milk", "oatmeal", "oil", "olive oil", "pancakes", "parmesan cheese", "pasta", "pepper", "pet food", "pickles", "protein bar", "red wine", "rice", "salad", "salmon", "salt", "sandwich", "shallot", "shampoo", "shrimp", "soda", "soup", "spaghetti", "sparkling water", "spinach", "strawberries", "strong cheese", "tea", "tomato juice", "tomato sauce", "tomatoes", "toothpaste", "turkey", "vegetables mix", "water spray", "white wine", "whole weat flour", "whole wheat pasta", "whole wheat rice", "yams", "yogurt cake", "zucchini")

names.level1<-c("nuts and seeds", "beverages", "vegetables", "fruits", "other", "meat", "sauces and spices", "beverages", "fruits", "cosmetics", "fruits", "snacks", "other", "sauces and spices", "fast food meals", "dairy products", "snacks", "snacks", "vegetables", "vegetables", "cereals", "alcoholic", "meat", "vegetables", "snacks", "snacks", "sauces and spices", "alcoholic", "other", "snacks", "oils and flour", "vegetables", "dairy products", "dairy products", "alcoholic", "vegetables", "dairy products", "snacks", "beverages", "meat", "snacks", "nuts and seeds", "fast food meals", "alcoholic", "bread", "fishes", "dairy products", "beverages", "vegetables", "snacks", "dairy products", "vegetables", "fruits", "beverages", "meat", "snacks", "meat", "snacks", "sauces and spices", "honey", "fast food meals", "sauces and spices", "cosmetics", "sauces and spices", "dairy products", "other", "vegetables", "sauces and spices", "meat", "fruits", "dairy products" ,"beverages", "sauces and spices", "beverages", "snacks", "sauces and spices","other", "dairy products", "cereals", "oils and flour", "oils and flour", "ready-made meals", "dairy products", "pasta and rice", "vegetables", "other", "vegetables", "snacks", "alcoholic", "pasta and rice", "vegetables", "fishes", "sauces and spices", "bread", "vegetables", "cosmetics", "fishes", "beverages", "ready-made meals", "ready-made meals", "beverages", "vegetables", "fruits", "dairy products", "beverages", "beverages", "sauces and spices", "vegetables", "cosmetics", "meat", "vegetables", "cosmetics", "alcoholic", "oils and flour", "pasta and rice", "pasta and rice", "vegetables", "snacks", "vegetables")

itemInfo(trans) <- data.frame(labels = names.real, level1 = names.level1)
itemInfo(trans)
##                   labels            level1
## 1                almonds    nuts and seeds
## 2      antioxydant juice         beverages
## 3              asparagus        vegetables
## 4                avocado            fruits
## 5                  other             other
## 6                  bacon              meat
## 7         barbecue sauce sauces and spices
## 8              black tea         beverages
## 9            blueberries            fruits
## 10            body spray         cosmetics
## 11               bramble            fruits
## 12              brownies            snacks
## 13             bug spray             other
## 14          burger sauce sauces and spices
## 15               burgers   fast food meals
## 16                butter    dairy products
## 17                  cake            snacks
## 18            candy bars            snacks
## 19               carrots        vegetables
## 20           cauliflower        vegetables
## 21               cereals           cereals
## 22             champagne         alcoholic
## 23               chicken              meat
## 24                 chili        vegetables
## 25             chocolate            snacks
## 26       chocolate bread            snacks
## 27               chutney sauces and spices
## 28                 cider         alcoholic
## 29   clothes accessories             other
## 30               cookies            snacks
## 31           cooking oil    oils and flour
## 32                  corn        vegetables
## 33        cottage cheese    dairy products
## 34                 cream    dairy products
## 35          dessert wine         alcoholic
## 36              eggplant        vegetables
## 37                  eggs    dairy products
## 38            energy bar            snacks
## 39          energy drink         beverages
## 40              escalope              meat
## 41  extra dark chocolate            snacks
## 42             flax seed    nuts and seeds
## 43          french fries   fast food meals
## 44           french wine         alcoholic
## 45           fresh bread             bread
## 46            fresh tuna            fishes
## 47         fromage blanc    dairy products
## 48       frozen smoothie         beverages
## 49     frozen vegetables        vegetables
## 50       gluten free bar            snacks
## 51         grated cheese    dairy products
## 52           green beans        vegetables
## 53          green grapes            fruits
## 54             green tea         beverages
## 55           ground beef              meat
## 56                  gums            snacks
## 57                   ham              meat
## 58      hand protein bar            snacks
## 59         herb & pepper sauces and spices
## 60                 honey             honey
## 61              hot dogs   fast food meals
## 62               ketchup sauces and spices
## 63           light cream         cosmetics
## 64            light mayo sauces and spices
## 65        low fat yogurt    dairy products
## 66             magazines             other
## 67         mashed potato        vegetables
## 68            mayonnaise sauces and spices
## 69             meatballs              meat
## 70                melons            fruits
## 71                  milk    dairy products
## 72         mineral water         beverages
## 73                  mint sauces and spices
## 74        mint green tea         beverages
## 75               muffins            snacks
## 76  mushroom cream sauce sauces and spices
## 77               napkins             other
## 78           nonfat milk    dairy products
## 79               oatmeal           cereals
## 80                   oil    oils and flour
## 81             olive oil    oils and flour
## 82              pancakes  ready-made meals
## 83       parmesan cheese    dairy products
## 84                 pasta    pasta and rice
## 85                pepper        vegetables
## 86              pet food             other
## 87               pickles        vegetables
## 88           protein bar            snacks
## 89              red wine         alcoholic
## 90                  rice    pasta and rice
## 91                 salad        vegetables
## 92                salmon            fishes
## 93                  salt sauces and spices
## 94              sandwich             bread
## 95               shallot        vegetables
## 96               shampoo         cosmetics
## 97                shrimp            fishes
## 98                  soda         beverages
## 99                  soup  ready-made meals
## 100            spaghetti  ready-made meals
## 101      sparkling water         beverages
## 102              spinach        vegetables
## 103         strawberries            fruits
## 104        strong cheese    dairy products
## 105                  tea         beverages
## 106         tomato juice         beverages
## 107         tomato sauce sauces and spices
## 108             tomatoes        vegetables
## 109           toothpaste         cosmetics
## 110               turkey              meat
## 111       vegetables mix        vegetables
## 112          water spray         cosmetics
## 113           white wine         alcoholic
## 114     whole weat flour    oils and flour
## 115    whole wheat pasta    pasta and rice
## 116     whole wheat rice    pasta and rice
## 117                 yams        vegetables
## 118          yogurt cake            snacks
## 119             zucchini        vegetables
trans_level2<-aggregate(trans, by="level1")
itemFrequencyPlot(trans_level2, topN=20, type="relative", main="Item Frequency") 

4.2.Two-dimensions frequency inspection

With the two-dimensional analysis it is possible to see the connection between every two items in the dataset.

#How many times the two types of products appear in one transaction

ctab<-crossTable(trans_level2, sort=TRUE)
ctab
##                   beverages dairy products snacks meat ready-made meals
## beverages              3218           1555   1432 1154             1119
## dairy products         1555           3169   1391 1116             1089
## snacks                 1432           1391   3151  923              918
## meat                   1154           1116    923 2225              868
## ready-made meals       1119           1089    918  868             2073
## vegetables             1088           1005    899  777              780
## fast food meals         907            932    878  648              585
## sauces and spices       587            561    514  478              463
## oils and flour          592            573    499  439              475
## fishes                  540            539    470  387              412
## alcoholic               441            427    404  308              323
## pasta and rice          487            463    405  381              312
## fruits                  327            292    276  227              233
## honey                   197            174    162  148              134
## bread                   187            188    167  134              114
## cosmetics               173            146    165  135              151
## other                   161            153    136  119              108
## cereals                 145            127    108   86              102
## nuts and seeds          138            131    107   91               87
##                   vegetables fast food meals sauces and spices
## beverages               1088             907               587
## dairy products          1005             932               561
## snacks                   899             878               514
## meat                     777             648               478
## ready-made meals         780             585               463
## vegetables              1974             588               381
## fast food meals          588            1940               318
## sauces and spices        381             318              1144
## oils and flour           401             280               252
## fishes                   447             253               216
## alcoholic                284             272               164
## pasta and rice           316             283               180
## fruits                   220             199               100
## honey                    132             110                67
## bread                    137             106                83
## cosmetics                115              96                73
## other                    106              81                77
## cereals                   91              71                66
## nuts and seeds            89              86                42
##                   oils and flour fishes alcoholic pasta and rice fruits
## beverages                    592    540       441            487    327
## dairy products               573    539       427            463    292
## snacks                       499    470       404            405    276
## meat                         439    387       308            381    227
## ready-made meals             475    412       323            312    233
## vegetables                   401    447       284            316    220
## fast food meals              280    253       272            283    199
## sauces and spices            252    216       164            180    100
## oils and flour              1032    226       151            193    127
## fishes                       226    961       155            176    120
## alcoholic                    151    155       904            145     90
## pasta and rice               193    176       145            881     93
## fruits                       127    120        90             93    616
## honey                         74     79        51             57     48
## bread                         73     75        70             53     50
## cosmetics                     87     57        43             49     49
## other                         61     61        55             54     32
## cereals                       61     49        42             38     27
## nuts and seeds                48     48        35             31     28
##                   honey bread cosmetics other cereals nuts and seeds
## beverages           197   187       173   161     145            138
## dairy products      174   188       146   153     127            131
## snacks              162   167       165   136     108            107
## meat                148   134       135   119      86             91
## ready-made meals    134   114       151   108     102             87
## vegetables          132   137       115   106      91             89
## fast food meals     110   106        96    81      71             86
## sauces and spices    67    83        73    77      66             42
## oils and flour       74    73        87    61      61             48
## fishes               79    75        57    61      49             48
## alcoholic            51    70        43    55      42             35
## pasta and rice       57    53        49    54      38             31
## fruits               48    50        49    32      27             28
## honey               356    25        23    16      15             16
## bread                25   355        20    28      23             16
## cosmetics            23    20       302    22      17              9
## other                16    28        22   290      15             18
## cereals              15    23        17    15     225             11
## nuts and seeds       16    16         9    18      11            219

The amount of products in each category has of course major impact on the number of the transactions in which it occurs with other products. The Chi-squared test can also be performed to check the null hypothesis that occurence of the products in these pairs is independent.

chi2tab<-crossTable(trans_level2, measure="chiSquared", sort=TRUE)
chi2tab
##                      beverages dairy products       snacks         meat
## beverages                   NA   0.0037466879 6.341890e-04 5.556089e-03
## dairy products    0.0037466879             NA 3.578228e-04 4.392554e-03
## snacks            0.0006341890   0.0003578228           NA 1.943191e-05
## meat              0.0055560889   0.0043925545 1.943191e-05           NA
## ready-made meals  0.0079067527   0.0069194876 3.407711e-04 1.388761e-02
## vegetables        0.0091535267   0.0046760292 7.825446e-04 8.345843e-03
## fast food meals   0.0008943448   0.0020547881 6.503093e-04 1.219169e-03
## sauces and spices 0.0025145310   0.0016647181 3.100548e-04 7.553312e-03
## oils and flour    0.0067086344   0.0057393520 1.318536e-03 7.689797e-03
## fishes            0.0052749863   0.0058083803 1.451877e-03 4.860123e-03
## alcoholic         0.0009720034   0.0007094019 2.064478e-04 7.894749e-04
## pasta and rice    0.0041940242   0.0029529230 4.390582e-04 7.305932e-03
## fruits            0.0019851173   0.0005165335 1.529892e-04 1.430406e-03
## honey             0.0017109393   0.0004936112 1.382333e-04 2.269692e-03
## bread             0.0010541092   0.0012849576 2.855593e-04 1.042620e-03
## cosmetics         0.0019416435   0.0003542176 1.528374e-03 3.069946e-03
## other             0.0014344176   0.0010110142 2.199670e-04 1.685482e-03
## cereals           0.0032451062   0.0014309939 2.563991e-04 7.408843e-04
## nuts and seeds    0.0027529674   0.0021332815 3.261877e-04 1.391436e-03
##                   ready-made meals   vegetables fast food meals
## beverages             0.0079067527 0.0091535267    8.943448e-04
## dairy products        0.0069194876 0.0046760292    2.054788e-03
## snacks                0.0003407711 0.0007825446    6.503093e-04
## meat                  0.0138876089 0.0083458428    1.219169e-03
## ready-made meals                NA 0.0134334598    5.935067e-04
## vegetables            0.0134334598           NA    1.566776e-03
## fast food meals       0.0005935067 0.0015667760              NA
## sauces and spices     0.0090921524 0.0028297493    2.205618e-04
## oils and flour        0.0168376867 0.0082211848    8.560581e-05
## fishes                0.0107608982 0.0198597562    1.064308e-05
## alcoholic             0.0028567432 0.0011908743    8.319115e-04
## pasta and rice        0.0025710529 0.0040719540    1.779244e-03
## fruits                0.0030845370 0.0027560396    1.317702e-03
## honey                 0.0017187382 0.0020888237    4.653297e-04
## bread                 0.0003431462 0.0027097444    2.921890e-04
## cosmetics             0.0072860873 0.0021168712    5.464640e-04
## other                 0.0012906225 0.0015390273    6.391755e-05
## cereals               0.0033992524 0.0022750653    3.758046e-04
## nuts and seeds        0.0015441047 0.0022758912    2.028864e-03
##                   sauces and spices oils and flour       fishes
## beverages              2.514531e-03   6.708634e-03 5.274986e-03
## dairy products         1.664718e-03   5.739352e-03 5.808380e-03
## snacks                 3.100548e-04   1.318536e-03 1.451877e-03
## meat                   7.553312e-03   7.689797e-03 4.860123e-03
## ready-made meals       9.092152e-03   1.683769e-02 1.076090e-02
## vegetables             2.829749e-03   8.221185e-03 1.985976e-02
## fast food meals        2.205618e-04   8.560581e-05 1.064308e-05
## sauces and spices                NA   7.581184e-03 4.385383e-03
## oils and flour         7.581184e-03             NA 8.868592e-03
## fishes                 4.385383e-03   8.868592e-03           NA
## alcoholic              6.601250e-04   7.599232e-04 1.767264e-03
## pasta and rice         2.066399e-03   5.668643e-03 4.707242e-03
## fruits                 5.197447e-05   2.807934e-03 2.850789e-03
## honey                  3.963682e-04   1.704030e-03 3.258933e-03
## bread                  2.050570e-03   1.593061e-03 2.554134e-03
## cosmetics              2.100859e-03   6.628078e-03 1.155031e-03
## other                  3.237140e-03   1.487790e-03 2.040423e-03
## cereals                3.900203e-03   3.887380e-03 1.882224e-03
## nuts and seeds         2.951833e-04   1.412885e-03 1.889703e-03
##                      alcoholic pasta and rice       fruits        honey
## beverages         0.0009720034   0.0041940242 1.985117e-03 1.710939e-03
## dairy products    0.0007094019   0.0029529230 5.165335e-04 4.936112e-04
## snacks            0.0002064478   0.0004390582 1.529892e-04 1.382333e-04
## meat              0.0007894749   0.0073059319 1.430406e-03 2.269692e-03
## ready-made meals  0.0028567432   0.0025710529 3.084537e-03 1.718738e-03
## vegetables        0.0011908743   0.0040719540 2.756040e-03 2.088824e-03
## fast food meals   0.0008319115   0.0017792435 1.317702e-03 4.653297e-04
## sauces and spices 0.0006601250   0.0020663988 5.197447e-05 3.963682e-04
## oils and flour    0.0007599232   0.0056686428 2.807934e-03 1.704030e-03
## fishes            0.0017672644   0.0047072422 2.850789e-03 3.258933e-03
## alcoholic                   NA   0.0018926169 4.461065e-04 2.036605e-04
## pasta and rice    0.0018926169             NA 7.857621e-04 7.354334e-04
## fruits            0.0004461065   0.0007857621           NA 1.605610e-03
## honey             0.0002036605   0.0007354334 1.605610e-03           NA
## bread             0.0023081480   0.0004086285 1.987282e-03 5.257814e-04
## cosmetics         0.0001597389   0.0006880178 3.147811e-03 6.986802e-04
## other             0.0015334232   0.0015561158 3.749786e-04 4.844963e-05
## cereals           0.0010890954   0.0006757293 5.240432e-04 2.331424e-04
## nuts and seeds    0.0003741646   0.0001443965 7.435227e-04 4.031261e-04
##                          bread    cosmetics        other      cereals
## beverages         0.0010541092 1.941643e-03 1.434418e-03 0.0032451062
## dairy products    0.0012849576 3.542176e-04 1.011014e-03 0.0014309939
## snacks            0.0002855593 1.528374e-03 2.199670e-04 0.0002563991
## meat              0.0010426197 3.069946e-03 1.685482e-03 0.0007408843
## ready-made meals  0.0003431462 7.286087e-03 1.290623e-03 0.0033992524
## vegetables        0.0027097444 2.116871e-03 1.539027e-03 0.0022750653
## fast food meals   0.0002921890 5.464640e-04 6.391755e-05 0.0003758046
## sauces and spices 0.0020505702 2.100859e-03 3.237140e-03 0.0039002033
## oils and flour    0.0015930607 6.628078e-03 1.487790e-03 0.0038873798
## fishes            0.0025541337 1.155031e-03 2.040423e-03 0.0018822244
## alcoholic         0.0023081480 1.597389e-04 1.533423e-03 0.0010890954
## pasta and rice    0.0004086285 6.880178e-04 1.556116e-03 0.0006757293
## fruits            0.0019872819 3.147811e-03 3.749786e-04 0.0005240432
## honey             0.0005257814 6.986802e-04 4.844963e-05 0.0002331424
## bread                       NA 3.038203e-04 1.979410e-03 0.0019099539
## cosmetics         0.0003038203           NA 1.217054e-03 0.0009280763
## other             0.0019794103 1.217054e-03           NA 0.0006084999
## cereals           0.0019099539 9.280763e-04 6.084999e-04           NA
## nuts and seeds    0.0004084832 5.051095e-07 1.430964e-03 0.0003984305
##                   nuts and seeds
## beverages           2.752967e-03
## dairy products      2.133281e-03
## snacks              3.261877e-04
## meat                1.391436e-03
## ready-made meals    1.544105e-03
## vegetables          2.275891e-03
## fast food meals     2.028864e-03
## sauces and spices   2.951833e-04
## oils and flour      1.412885e-03
## fishes              1.889703e-03
## alcoholic           3.741646e-04
## pasta and rice      1.443965e-04
## fruits              7.435227e-04
## honey               4.031261e-04
## bread               4.084832e-04
## cosmetics           5.051095e-07
## other               1.430964e-03
## cereals             3.984305e-04
## nuts and seeds                NA

One can see that in almost all cases the null hypothesis should be rejected so the occurence of products is not independent.

4.3.Apriori algorithm

Apriori algorithm is used to create the itemsets and rules for the given data. Each time it selects the rules basing on their support (number of transactions with all products in the itemset divided by total number of transactions) and increases itemsets by one element. The confidence of the rules means the number of transactions with all products in the itemset divided by the number of transactions with the left handside of the rule (itemset that is the antecedent) To obtain any results to analysis the confidence had to be lowered, from the default value equal to 80%, to 50%. Nine rules were obtained.

rules.trans<-apriori(trans_level2, parameter=list(supp=0.1, conf=0.5)) 
rules.by.conf<-sort(rules.trans, by="confidence", decreasing=TRUE) 
inspect(rules.by.conf)
##     lhs                           rhs              support   confidence
## [1] {dairy products,snacks}    => {beverages}      0.1079856 0.5823149 
## [2] {beverages,snacks}         => {dairy products} 0.1079856 0.5656425 
## [3] {vegetables}               => {beverages}      0.1450473 0.5511651 
## [4] {ready-made meals}         => {beverages}      0.1491801 0.5397974 
## [5] {ready-made meals}         => {dairy products} 0.1451806 0.5253256 
## [6] {beverages,dairy products} => {snacks}         0.1079856 0.5209003 
## [7] {meat}                     => {beverages}      0.1538462 0.5186517 
## [8] {vegetables}               => {dairy products} 0.1339821 0.5091185 
## [9] {meat}                     => {dairy products} 0.1487802 0.5015730 
##     lift     count
## [1] 1.357347  810 
## [2] 1.338872  810 
## [3] 1.284739 1088 
## [4] 1.258241 1119 
## [5] 1.243442 1089 
## [6] 1.240011  810 
## [7] 1.208952 1154 
## [8] 1.205080 1005 
## [9] 1.187220 1116

One can see that those rules contain antecedent itemsets with one or two items. All rules have quite similar support below 0.16 and confidence below 0.6. Another useful measure is the Lift which equals to the confidence divided by expected confidence of the rule. In other words it says how much more often items appear in transaction comparing to the number of times we would expect them to appear if they were independent. Both beverages and dairy products appear as the consequents four times each. There is also one rule showing that people buy snacks if they buy beverages and dairy products. Antecedents are more diversified.

plot(rules.trans, measure=c("confidence","lift"), shading="support")

It can also be seen on the graph that the confidence and lift measures for the rules increase collectively but the rules with the highest values of those two measures are the ones that have the smallest support.

is.significant(rules.trans, trans_level2)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
is.maximal(rules.trans)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
is.redundant(rules.trans)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

All of those rules are significant according to Fisher’s exact test and all of them are maximal. None of them is redundant so there aren’t any more general rules with at least as high confidence as their.

trans.closed<-apriori(trans_level2, 
                      parameter=list(target="closed frequent itemsets", support=0.01))

rules.closed<-ruleInduction(trans.closed, trans_level2, control=list(verbose=TRUE))
rules.closed
## set of 1 rules
inspect(rules.closed) 
##     lhs                   rhs            support confidence     lift itemset
## [1] {pasta and rice,                                                        
##      ready-made meals,                                                      
##      snacks,                                                                
##      vegetables}       => {beverages} 0.01013198  0.8085106 1.884599     600

Closed itemsets are the ones which supersets have lower support. There is only one itemset like that with the support over 1% but it has high confidence and lift. It is quite long itemset with five items.

4.4.Dissimilarity

d.jac.i<-dissimilarity(trans_level2, which="items", method = "dice")
round(d.jac.i,2) 
##                   alcoholic beverages bread cereals cosmetics
## beverages              0.79                                  
## bread                  0.89      0.90                        
## cereals                0.93      0.92  0.92                  
## cosmetics              0.93      0.90  0.94    0.94          
## dairy products         0.79      0.51  0.89    0.93      0.92
## fast food meals        0.81      0.65  0.91    0.93      0.91
## fishes                 0.83      0.74  0.89    0.92      0.91
## fruits                 0.88      0.83  0.90    0.94      0.89
## honey                  0.92      0.89  0.93    0.95      0.93
## meat                   0.80      0.58  0.90    0.93      0.89
## nuts and seeds         0.94      0.92  0.94    0.95      0.97
## oils and flour         0.84      0.72  0.89    0.90      0.87
## other                  0.91      0.91  0.91    0.94      0.93
## pasta and rice         0.84      0.76  0.91    0.93      0.92
## ready-made meals       0.78      0.58  0.91    0.91      0.87
## sauces and spices      0.84      0.73  0.89    0.90      0.90
## snacks                 0.80      0.55  0.90    0.94      0.90
## vegetables             0.80      0.58  0.88    0.92      0.90
##                   dairy products fast food meals fishes fruits honey meat
## beverages                                                                
## bread                                                                    
## cereals                                                                  
## cosmetics                                                                
## dairy products                                                           
## fast food meals             0.64                                         
## fishes                      0.74            0.83                         
## fruits                      0.85            0.84   0.85                  
## honey                       0.90            0.90   0.88   0.90           
## meat                        0.59            0.69   0.76   0.84  0.89     
## nuts and seeds              0.92            0.92   0.92   0.93  0.94 0.93
## oils and flour              0.73            0.81   0.77   0.85  0.89 0.73
## other                       0.91            0.93   0.90   0.93  0.95 0.91
## pasta and rice              0.77            0.80   0.81   0.88  0.91 0.75
## ready-made meals            0.58            0.71   0.73   0.83  0.89 0.60
## sauces and spices           0.74            0.79   0.79   0.89  0.91 0.72
## snacks                      0.56            0.66   0.77   0.85  0.91 0.66
## vegetables                  0.61            0.70   0.70   0.83  0.89 0.63
##                   nuts and seeds oils and flour other pasta and rice
## beverages                                                           
## bread                                                               
## cereals                                                             
## cosmetics                                                           
## dairy products                                                      
## fast food meals                                                     
## fishes                                                              
## fruits                                                              
## honey                                                               
## meat                                                                
## nuts and seeds                                                      
## oils and flour              0.92                                    
## other                       0.93           0.91                     
## pasta and rice              0.94           0.80  0.91               
## ready-made meals            0.92           0.69  0.91           0.79
## sauces and spices           0.94           0.77  0.89           0.82
## snacks                      0.94           0.76  0.92           0.80
## vegetables                  0.92           0.73  0.91           0.78
##                   ready-made meals sauces and spices snacks
## beverages                                                  
## bread                                                      
## cereals                                                    
## cosmetics                                                  
## dairy products                                             
## fast food meals                                            
## fishes                                                     
## fruits                                                     
## honey                                                      
## meat                                                       
## nuts and seeds                                             
## oils and flour                                             
## other                                                      
## pasta and rice                                             
## ready-made meals                                           
## sauces and spices             0.71                         
## snacks                        0.65              0.76       
## vegetables                    0.61              0.76   0.65

According to Dice’s coefficient all groups of products are generally very dissimilar. Additionally the dendrogram for those categories is presented below.

plot(hclust(d.jac.i, method = "ward.D2"), main = "Dendrogram for items")

4.5.What makes people buy healthy or unhealthy food?

Now let’s focus on the four types of products which are especially connected with the healthy or unhealthy eating: vegetables, fruits, snacks and fast food meals. To obtain the results for those specific rules the support and confidence were lowered again to respectively 5% (2% for fruits) and 20% (10% for fruits).

Vegetables

rules.veg.r<-apriori(data=trans_level2, parameter=list(supp=0.05,conf = 0.2), 
                    appearance=list(default="lhs", rhs="vegetables"), control=list(verbose=F)) 
rules.veg.r.bylift<-sort(rules.veg.r, by="lift", decreasing=TRUE)
inspect(rules.veg.r.bylift)
##      lhs                                  rhs          support   
## [1]  {fishes}                          => {vegetables} 0.05959205
## [2]  {beverages,ready-made meals}      => {vegetables} 0.06719104
## [3]  {meat,ready-made meals}           => {vegetables} 0.05172644
## [4]  {ready-made meals,snacks}         => {vegetables} 0.05239301
## [5]  {meat,snacks}                     => {vegetables} 0.05132649
## [6]  {beverages,meat}                  => {vegetables} 0.06372484
## [7]  {dairy products,ready-made meals} => {vegetables} 0.05972537
## [8]  {dairy products,meat}             => {vegetables} 0.05959205
## [9]  {beverages,snacks}                => {vegetables} 0.07452340
## [10] {beverages,dairy products}        => {vegetables} 0.08078923
## [11] {oils and flour}                  => {vegetables} 0.05345954
## [12] {ready-made meals}                => {vegetables} 0.10398614
## [13] {dairy products,snacks}           => {vegetables} 0.06972404
## [14] {meat}                            => {vegetables} 0.10358619
## [15] {beverages}                       => {vegetables} 0.14504733
## [16] {sauces and spices}               => {vegetables} 0.05079323
## [17] {dairy products}                  => {vegetables} 0.13398214
## [18] {fast food meals}                 => {vegetables} 0.07838955
## [19] {snacks}                          => {vegetables} 0.11985069
## [20] {}                                => {vegetables} 0.26316491
##      confidence lift     count
## [1]  0.4651405  1.767487  447 
## [2]  0.4504021  1.711483  504 
## [3]  0.4470046  1.698572  388 
## [4]  0.4281046  1.626754  393 
## [5]  0.4171181  1.585006  385 
## [6]  0.4142114  1.573961  478 
## [7]  0.4113866  1.563227  448 
## [8]  0.4005376  1.522002  447 
## [9]  0.3903631  1.483340  559 
## [10] 0.3897106  1.480861  606 
## [11] 0.3885659  1.476511  401 
## [12] 0.3762663  1.429774  780 
## [13] 0.3759885  1.428718  523 
## [14] 0.3492135  1.326976  777 
## [15] 0.3380982  1.284739 1088 
## [16] 0.3330420  1.265526  381 
## [17] 0.3171347  1.205080 1005 
## [18] 0.3030928  1.151722  588 
## [19] 0.2853063  1.084135  899 
## [20] 0.2631649  1.000000 1974
plot(rules.veg.r, method="paracoord", control=list(reorder=TRUE))

There are 19 rules plus rule with only vegetables in the itemset. one can see that according to lift surprisingly popular are fishes and also itemsets with ready-made meals and something else (beverages, meat, snacks..). Snacks appear few times, even alone in the antecedent itemset although on the last place with the lift only minimally higher than the one. Fast food meals appear only alone on the left handside with only 15% higher occurence than in case of the independence. Surprisingly there is no fruits among the rules at all.

is.significant(rules.veg.r, trans_level2)
##  [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [12]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
inspect(rules.veg.r[is.significant(rules.veg.r, trans_level2)==F])
##     lhs    rhs          support   confidence lift count
## [1] {}  => {vegetables} 0.2631649 0.2631649  1    1974
is.redundant(rules.veg.r)
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

All of the rules are significant exept the one with no antecedent and none of them is redundant.

Fruits

rules.fru.r<-apriori(data=trans_level2, parameter=list(supp=0.02,conf = 0.1), 
                     appearance=list(default="lhs", rhs="fruits"), control=list(verbose=F)) 
rules.fru.r.bylift<-sort(rules.fru.r, by="lift", decreasing=TRUE)
inspect(rules.fru.r.bylift)
##     lhs                           rhs      support    confidence lift    
## [1] {beverages,dairy products} => {fruits} 0.02373017 0.1144695  1.393889
## [2] {ready-made meals}         => {fruits} 0.03106252 0.1123975  1.368658
## [3] {vegetables}               => {fruits} 0.02932942 0.1114488  1.357107
## [4] {dairy products,snacks}    => {fruits} 0.02026396 0.1092739  1.330623
## [5] {beverages,snacks}         => {fruits} 0.02079723 0.1089385  1.326539
## [6] {fast food meals}          => {fruits} 0.02652980 0.1025773  1.249079
## [7] {meat}                     => {fruits} 0.03026263 0.1020225  1.242322
## [8] {beverages}                => {fruits} 0.04359419 0.1016159  1.237372
##     count
## [1] 178  
## [2] 233  
## [3] 220  
## [4] 152  
## [5] 156  
## [6] 199  
## [7] 227  
## [8] 327
plot(rules.fru.r, method="graph")

There weren’t any rules for the values of support and confidence on the level used for the rest of products, so it had to be lowered even more. Among those rules one can see that the vegetables are among the products that make people buy fruits the most basing on the lift measure, but generally all those rules are quite weak.

is.significant(rules.fru.r, trans_level2)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
is.redundant(rules.fru.r)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

All of the rules are significant and none of them is redundant.

Snacks

rules.snacks.r<-apriori(data=trans_level2, parameter=list(supp=0.05,conf = 0.2), 
                      appearance=list(default="lhs", rhs="snacks"), control=list(verbose=F)) 
rules.snacks.r.bylift<-sort(rules.snacks.r, by="lift", decreasing=TRUE)
inspect(rules.snacks.r.bylift)
##      lhs                                  rhs      support    confidence
## [1]  {beverages,fast food meals}       => {snacks} 0.06479136 0.5358324 
## [2]  {dairy products,fast food meals}  => {snacks} 0.06532462 0.5257511 
## [3]  {beverages,dairy products}        => {snacks} 0.10798560 0.5209003 
## [4]  {dairy products,vegetables}       => {snacks} 0.06972404 0.5203980 
## [5]  {beverages,vegetables}            => {snacks} 0.07452340 0.5137868 
## [6]  {ready-made meals,vegetables}     => {snacks} 0.05239301 0.5038462 
## [7]  {beverages,ready-made meals}      => {snacks} 0.07452340 0.4995532 
## [8]  {meat,vegetables}                 => {snacks} 0.05132649 0.4954955 
## [9]  {fishes}                          => {snacks} 0.06265831 0.4890739 
## [10] {dairy products,ready-made meals} => {snacks} 0.07052393 0.4857668 
## [11] {dairy products,meat}             => {snacks} 0.07225703 0.4856631 
## [12] {oils and flour}                  => {snacks} 0.06652446 0.4835271 
## [13] {beverages,meat}                  => {snacks} 0.07345687 0.4774697 
## [14] {meat,ready-made meals}           => {snacks} 0.05479269 0.4735023 
## [15] {pasta and rice}                  => {snacks} 0.05399280 0.4597049 
## [16] {vegetables}                      => {snacks} 0.11985069 0.4554205 
## [17] {fast food meals}                 => {snacks} 0.11705106 0.4525773 
## [18] {sauces and spices}               => {snacks} 0.06852420 0.4493007 
## [19] {alcoholic}                       => {snacks} 0.05385949 0.4469027 
## [20] {beverages}                       => {snacks} 0.19090788 0.4449969 
## [21] {ready-made meals}                => {snacks} 0.12238368 0.4428365 
## [22] {dairy products}                  => {snacks} 0.18544194 0.4389397 
## [23] {}                                => {snacks} 0.42007732 0.4200773 
## [24] {meat}                            => {snacks} 0.12305026 0.4148315 
##      lift      count
## [1]  1.2755566  486 
## [2]  1.2515579  490 
## [3]  1.2400106  810 
## [4]  1.2388148  523 
## [5]  1.2230766  559 
## [6]  1.1994129  393 
## [7]  1.1891934  559 
## [8]  1.1795340  385 
## [9]  1.1642473  470 
## [10] 1.1563746  529 
## [11] 1.1561278  542 
## [12] 1.1510432  499 
## [13] 1.1366233  551 
## [14] 1.1271789  411 
## [15] 1.0943340  405 
## [16] 1.0841349  899 
## [17] 1.0773667  878 
## [18] 1.0695667  514 
## [19] 1.0638581  404 
## [20] 1.0593214 1432 
## [21] 1.0541785  918 
## [22] 1.0449022 1391 
## [23] 1.0000000 3151 
## [24] 0.9875122  923
plot(rules.snacks.r, method="paracoord", control=list(reorder=TRUE))

More rules are for the snacks. The lift values aren’t as high as for the rules for vegetables but there are many interesting combinations. In the two top antecedent itemsets fast food meals appear so it seems that unhealthy food sticks together.

is.significant(rules.snacks.r, trans_level2)
##  [1] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE
## [12]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [23]  TRUE  TRUE
inspect(rules.snacks.r[is.significant(rules.snacks.r, trans_level2)==F])
##     lhs                    rhs      support    confidence lift      count
## [1] {}                  => {snacks} 0.42007732 0.4200773  1.0000000 3151 
## [2] {alcoholic}         => {snacks} 0.05385949 0.4469027  1.0638581  404 
## [3] {pasta and rice}    => {snacks} 0.05399280 0.4597049  1.0943340  405 
## [4] {sauces and spices} => {snacks} 0.06852420 0.4493007  1.0695667  514 
## [5] {fast food meals}   => {snacks} 0.11705106 0.4525773  1.0773667  878 
## [6] {ready-made meals}  => {snacks} 0.12238368 0.4428365  1.0541785  918 
## [7] {meat}              => {snacks} 0.12305026 0.4148315  0.9875122  923 
## [8] {dairy products}    => {snacks} 0.18544194 0.4389397  1.0449022 1391
is.redundant(rules.snacks.r)
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [23] FALSE FALSE
inspect(rules.snacks.r[is.redundant(rules.snacks.r)==T])
##     lhs       rhs      support   confidence lift      count
## [1] {meat} => {snacks} 0.1230503 0.4148315  0.9875122 923

Some of the rules with single items as the antecedents are not significant including the rule with the fast food meal. One of them, with meat on the left handside is also redundant.

Fast food meals

rules.ff.r<-apriori(data=trans_level2, parameter=list(supp=0.05,conf = 0.2), 
                        appearance=list(default="lhs", rhs="fast food meals"), control=list(verbose=F)) 
rules.ff.r.bylift<-sort(rules.ff.r, by="lift", decreasing=TRUE)
inspect(rules.ff.r.bylift)
##      lhs                           rhs               support    confidence
## [1]  {dairy products,snacks}    => {fast food meals} 0.06532462 0.3522646 
## [2]  {beverages,snacks}         => {fast food meals} 0.06479136 0.3393855 
## [3]  {beverages,dairy products} => {fast food meals} 0.06799093 0.3279743 
## [4]  {vegetables}               => {fast food meals} 0.07838955 0.2978723 
## [5]  {dairy products}           => {fast food meals} 0.12425010 0.2940991 
## [6]  {meat}                     => {fast food meals} 0.08638848 0.2912360 
## [7]  {ready-made meals}         => {fast food meals} 0.07798960 0.2821997 
## [8]  {beverages}                => {fast food meals} 0.12091721 0.2818521 
## [9]  {snacks}                   => {fast food meals} 0.11705106 0.2786417 
## [10] {}                         => {fast food meals} 0.25863218 0.2586322 
##      lift     count
## [1]  1.362029  490 
## [2]  1.312232  486 
## [3]  1.268111  510 
## [4]  1.151722  588 
## [5]  1.137133  932 
## [6]  1.126062  648 
## [7]  1.091124  585 
## [8]  1.089780  907 
## [9]  1.077367  878 
## [10] 1.000000 1940
plot(rules.ff.r, method="graph")

The results for the fast food meals show that indeed also for fast food there are no stronger rules according to the lift measure than those described above in the part about snacks.

is.significant(rules.ff.r, trans_level2)
##  [1] FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
inspect(rules.ff.r[is.significant(rules.ff.r, trans_level2)==F])
##     lhs                   rhs               support   confidence lift    
## [1] {}                 => {fast food meals} 0.2586322 0.2586322  1.000000
## [2] {ready-made meals} => {fast food meals} 0.0779896 0.2821997  1.091124
##     count
## [1] 1940 
## [2]  585
is.redundant(rules.ff.r)
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

In this case the fast food meals alone and the rule of ready-made meals leading to fast food meals buing are not significant. None of those rules is redundant.

5.Summary

Association rules aren’t completely obvious even after the market basket analysis but this technique gives great opportunity to mine through the data and look for some unique schemes which hide the truth about customers behavior.