Grocery Item Prediction

With this data set, we will predict what someone is most likely to buy based on the other items they have bought. We will be using market basket analysis to accomplish this. The data is already in transaction form so we do not need to convert it.

library(arules)
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
library(datasets)
data(Groceries)
library(arulesViz)
## Loading required package: grid

Grocery Item Frequency

The table below shows the frequency of each grocery item. However, it is cumbersom to determine anything from this table so we will put it in a table that shows the items with the highest frequency.

##               frankfurter                   sausage 
##              0.0589730554              0.0939501779 
##                liver loaf                       ham 
##              0.0050838841              0.0260294865 
##                      meat         finished products 
##              0.0258261312              0.0065073716 
##           organic sausage                   chicken 
##              0.0022369090              0.0429079817 
##                    turkey                      pork 
##              0.0081342145              0.0576512456 
##                      beef            hamburger meat 
##              0.0524656838              0.0332486019 
##                      fish              citrus fruit 
##              0.0029486528              0.0827656329 
##            tropical fruit                 pip fruit 
##              0.1049313676              0.0756481952 
##                    grapes                   berries 
##              0.0223690900              0.0332486019 
##               nuts/prunes           root vegetables 
##              0.0033553635              0.1089984748 
##                    onions                     herbs 
##              0.0310116929              0.0162684291 
##          other vegetables packaged fruit/vegetables 
##              0.1934926284              0.0130147433 
##                whole milk                    butter 
##              0.2555160142              0.0554143366 
##                      curd                   dessert 
##              0.0532791052              0.0371123538 
##               butter milk                    yogurt 
##              0.0279613625              0.1395017794 
##        whipped/sour cream                 beverages 
##              0.0716827656              0.0260294865 
##                  UHT-milk            condensed milk 
##              0.0334519573              0.0102694459 
##                     cream               soft cheese 
##              0.0013218099              0.0170818505 
##             sliced cheese               hard cheese 
##              0.0245043213              0.0245043213 
##             cream cheese           processed cheese 
##              0.0396542959              0.0165734621 
##             spread cheese               curd cheese 
##              0.0111845450              0.0050838841 
##          specialty cheese                mayonnaise 
##              0.0085409253              0.0091509914 
##            salad dressing                   tidbits 
##              0.0008134215              0.0023385867 
##         frozen vegetables             frozen fruits 
##              0.0480935435              0.0012201322 
##              frozen meals               frozen fish 
##              0.0283680732              0.0116929334 
##            frozen chicken                 ice cream 
##              0.0006100661              0.0250127097 
##            frozen dessert    frozen potato products 
##              0.0107778343              0.0084392476 
##             domestic eggs                rolls/buns 
##              0.0634468734              0.1839349263 
##               white bread               brown bread 
##              0.0420945602              0.0648703610 
##                    pastry            roll products  
##              0.0889679715              0.0102694459 
##       semi-finished bread                  zwieback 
##              0.0176919166              0.0069140824 
##           potato products                     flour 
##              0.0028469751              0.0173868836 
##                      salt                      rice 
##              0.0107778343              0.0076258261 
##                     pasta                   vinegar 
##              0.0150482969              0.0065073716 
##                       oil                 margarine 
##              0.0280630402              0.0585663447 
##             specialty fat                     sugar 
##              0.0036603965              0.0338586680 
##          artif. sweetener                     honey 
##              0.0032536858              0.0015251652 
##                   mustard                   ketchup 
##              0.0119979664              0.0042704626 
##                    spices                     soups 
##              0.0051855618              0.0068124047 
##               ready soups     Instant food products 
##              0.0018301983              0.0080325369 
##                    sauces                   cereals 
##              0.0054905948              0.0056939502 
##          organic products             baking powder 
##              0.0016268429              0.0176919166 
##     preservation products            pudding powder 
##              0.0002033554              0.0023385867 
##         canned vegetables              canned fruit 
##              0.0107778343              0.0032536858 
##        pickled vegetables      specialty vegetables 
##              0.0178952720              0.0017285206 
##                       jam             sweet spreads 
##              0.0053889171              0.0090493137 
##              meat spreads               canned fish 
##              0.0042704626              0.0150482969 
##                  dog food                  cat food 
##              0.0085409253              0.0232841891 
##                  pet care                 baby food 
##              0.0094560244              0.0001016777 
##                    coffee            instant coffee 
##              0.0580579563              0.0074224708 
##                       tea              cocoa drinks 
##              0.0038637519              0.0022369090 
##             bottled water                      soda 
##              0.1105236401              0.1743772242 
##           misc. beverages     fruit/vegetable juice 
##              0.0283680732              0.0722928317 
##                     syrup              bottled beer 
##              0.0032536858              0.0805287239 
##               canned beer                    brandy 
##              0.0776817489              0.0041687850 
##                    whisky                    liquor 
##              0.0008134215              0.0110828673 
##                       rum                   liqueur 
##              0.0044738180              0.0009150991 
##        liquor (appetizer)                white wine 
##              0.0079308592              0.0190137265 
##            red/blush wine                  prosecco 
##              0.0192170819              0.0020335536 
##            sparkling wine               salty snack 
##              0.0055922725              0.0378240976 
##                   popcorn                 nut snack 
##              0.0072191154              0.0031520081 
##            snack products  long life bakery product 
##              0.0030503305              0.0374173869 
##                   waffles                  cake bar 
##              0.0384341637              0.0132180986 
##               chewing gum                 chocolate 
##              0.0210472801              0.0496187087 
##         cooking chocolate       specialty chocolate 
##              0.0025419420              0.0304016268 
##             specialty bar     chocolate marshmallow 
##              0.0273512964              0.0090493137 
##                     candy         seasonal products 
##              0.0298932384              0.0142348754 
##                 detergent                  softener 
##              0.0192170819              0.0054905948 
##               decalcifier              dish cleaner 
##              0.0015251652              0.0104728012 
##          abrasive cleaner                   cleaner 
##              0.0035587189              0.0050838841 
##            toilet cleaner          bathroom cleaner 
##              0.0007117438              0.0027452974 
##                hair spray               dental care 
##              0.0011184545              0.0057956279 
##            male cosmetics           make up remover 
##              0.0045754957              0.0008134215 
##                 skin care  female sanitary products 
##              0.0035587189              0.0061006609 
##            baby cosmetics                      soap 
##              0.0006100661              0.0026436197 
##           rubbing alcohol          hygiene articles 
##              0.0010167768              0.0329435689 
##                   napkins                    dishes 
##              0.0523640061              0.0175902389 
##                  cookware           kitchen utensil 
##              0.0027452974              0.0004067107 
##           cling film/bags            kitchen towels 
##              0.0113879004              0.0059989832 
##    house keeping products                   candles 
##              0.0083375699              0.0089476360 
##               light bulbs      sound storage medium 
##              0.0041687850              0.0001016777 
##                newspapers                photo/film 
##              0.0798169802              0.0092526690 
##                pot plants    flower soil/fertilizer 
##              0.0172852059              0.0019318760 
##            flower (seeds)             shopping bags 
##              0.0103711235              0.0985256736 
##                      bags 
##              0.0004067107

According to this graph, there are twelve items with a frequency higher than 8%. Some of these items include whole milk, rolls/buns, and other vegetables. Next we will determine the some rules based on these item frequencies.

Market Basket Rules

Below are the rules for what someone is most likely to buy based on what they have bought. However, not all of these rules are equally important because they all have varying levels of support, confidence, lift, and count. For this reason, we will only display rules that have a lift above 2.25 in the second table and then sort the applicable rules by their level of support.

groceryrules <- apriori(Groceries,parameter=list(support=.01,confidence=.5)) 
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 98 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [15 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
inspect(groceryrules)
##      lhs                     rhs                   support confidence     lift count
## [1]  {curd,                                                                         
##       yogurt}             => {whole milk}       0.01006609  0.5823529 2.279125    99
## [2]  {other vegetables,                                                             
##       butter}             => {whole milk}       0.01148958  0.5736041 2.244885   113
## [3]  {other vegetables,                                                             
##       domestic eggs}      => {whole milk}       0.01230300  0.5525114 2.162336   121
## [4]  {yogurt,                                                                       
##       whipped/sour cream} => {whole milk}       0.01087951  0.5245098 2.052747   107
## [5]  {other vegetables,                                                             
##       whipped/sour cream} => {whole milk}       0.01464159  0.5070423 1.984385   144
## [6]  {pip fruit,                                                                    
##       other vegetables}   => {whole milk}       0.01352313  0.5175097 2.025351   133
## [7]  {citrus fruit,                                                                 
##       root vegetables}    => {other vegetables} 0.01037112  0.5862069 3.029608   102
## [8]  {tropical fruit,                                                               
##       root vegetables}    => {other vegetables} 0.01230300  0.5845411 3.020999   121
## [9]  {tropical fruit,                                                               
##       root vegetables}    => {whole milk}       0.01199797  0.5700483 2.230969   118
## [10] {tropical fruit,                                                               
##       yogurt}             => {whole milk}       0.01514997  0.5173611 2.024770   149
## [11] {root vegetables,                                                              
##       yogurt}             => {other vegetables} 0.01291307  0.5000000 2.584078   127
## [12] {root vegetables,                                                              
##       yogurt}             => {whole milk}       0.01453991  0.5629921 2.203354   143
## [13] {root vegetables,                                                              
##       rolls/buns}         => {other vegetables} 0.01220132  0.5020921 2.594890   120
## [14] {root vegetables,                                                              
##       rolls/buns}         => {whole milk}       0.01270971  0.5230126 2.046888   125
## [15] {other vegetables,                                                             
##       yogurt}             => {whole milk}       0.02226741  0.5128806 2.007235   219
inspect(sort(subset(groceryrules, subset=lift > 2.25), by="support"))
##     lhs                  rhs                   support confidence     lift count
## [1] {root vegetables,                                                           
##      yogurt}          => {other vegetables} 0.01291307  0.5000000 2.584078   127
## [2] {tropical fruit,                                                            
##      root vegetables} => {other vegetables} 0.01230300  0.5845411 3.020999   121
## [3] {root vegetables,                                                           
##      rolls/buns}      => {other vegetables} 0.01220132  0.5020921 2.594890   120
## [4] {citrus fruit,                                                              
##      root vegetables} => {other vegetables} 0.01037112  0.5862069 3.029608   102
## [5] {curd,                                                                      
##      yogurt}          => {whole milk}       0.01006609  0.5823529 2.279125    99

We can look at the support, confidence, and lift levels to determine what the most significant rules are. Overall, these rules have small levels of all three. However, I would say that rule 2 would be the most interesting to look at. It has the second highest support level, the second highest confidence level, and the second highest lift. It also has the second highest count, showing that there are 121 cases of this rule occuring. I would look at this one rather than the one with the highest confidence and lift (rule 4) because it has a low level of support and a low count of this rule occuring.