This data set contains 30 days of point-of-sale transaction data from a typical local grocery outlet. This data set contains 9835 transactions and 169 items. We use Market Basket Analysis to predict what someone is most likely to buy based on the items bought.

library("arules")
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
data(Groceries)
summary(Groceries)
## transactions as itemMatrix in sparse format with
##  9835 rows (elements/itemsets/transactions) and
##  169 columns (items) and a density of 0.02609146 
## 
## most frequent items:
##       whole milk other vegetables       rolls/buns             soda 
##             2513             1903             1809             1715 
##           yogurt          (Other) 
##             1372            34055 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15 
## 2159 1643 1299 1005  855  645  545  438  350  246  182  117   78   77   55 
##   16   17   18   19   20   21   22   23   24   26   27   28   29   32 
##   46   29   14   14    9   11    4    6    1    1    1    1    3    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   4.409   6.000  32.000 
## 
## includes extended item information - examples:
##        labels  level2           level1
## 1 frankfurter sausage meat and sausage
## 2     sausage sausage meat and sausage
## 3  liver loaf sausage meat and sausage

There are 9,835 rows which are transactions. There are 169 columns which are different items bought. The most frequent items bought are whole milk, vegetables, rolls/buns, soda, and yogurt.

aa=as(Groceries,"matrix") #transforms transaction matrix into incidence matrix
aa[1:2,]
##      frankfurter sausage liver loaf   ham  meat finished products
## [1,]       FALSE   FALSE      FALSE FALSE FALSE             FALSE
## [2,]       FALSE   FALSE      FALSE FALSE FALSE             FALSE
##      organic sausage chicken turkey  pork  beef hamburger meat  fish
## [1,]           FALSE   FALSE  FALSE FALSE FALSE          FALSE FALSE
## [2,]           FALSE   FALSE  FALSE FALSE FALSE          FALSE FALSE
##      citrus fruit tropical fruit pip fruit grapes berries nuts/prunes
## [1,]         TRUE          FALSE     FALSE  FALSE   FALSE       FALSE
## [2,]        FALSE           TRUE     FALSE  FALSE   FALSE       FALSE
##      root vegetables onions herbs other vegetables
## [1,]           FALSE  FALSE FALSE            FALSE
## [2,]           FALSE  FALSE FALSE            FALSE
##      packaged fruit/vegetables whole milk butter  curd dessert butter milk
## [1,]                     FALSE      FALSE  FALSE FALSE   FALSE       FALSE
## [2,]                     FALSE      FALSE  FALSE FALSE   FALSE       FALSE
##      yogurt whipped/sour cream beverages UHT-milk condensed milk cream
## [1,]  FALSE              FALSE     FALSE    FALSE          FALSE FALSE
## [2,]   TRUE              FALSE     FALSE    FALSE          FALSE FALSE
##      soft cheese sliced cheese hard cheese cream cheese  processed cheese
## [1,]       FALSE         FALSE       FALSE         FALSE            FALSE
## [2,]       FALSE         FALSE       FALSE         FALSE            FALSE
##      spread cheese curd cheese specialty cheese mayonnaise salad dressing
## [1,]         FALSE       FALSE            FALSE      FALSE          FALSE
## [2,]         FALSE       FALSE            FALSE      FALSE          FALSE
##      tidbits frozen vegetables frozen fruits frozen meals frozen fish
## [1,]   FALSE             FALSE         FALSE        FALSE       FALSE
## [2,]   FALSE             FALSE         FALSE        FALSE       FALSE
##      frozen chicken ice cream frozen dessert frozen potato products
## [1,]          FALSE     FALSE          FALSE                  FALSE
## [2,]          FALSE     FALSE          FALSE                  FALSE
##      domestic eggs rolls/buns white bread brown bread pastry
## [1,]         FALSE      FALSE       FALSE       FALSE  FALSE
## [2,]         FALSE      FALSE       FALSE       FALSE  FALSE
##      roll products  semi-finished bread zwieback potato products flour
## [1,]          FALSE                TRUE    FALSE           FALSE FALSE
## [2,]          FALSE               FALSE    FALSE           FALSE FALSE
##       salt  rice pasta vinegar   oil margarine specialty fat sugar
## [1,] FALSE FALSE FALSE   FALSE FALSE      TRUE         FALSE FALSE
## [2,] FALSE FALSE FALSE   FALSE FALSE     FALSE         FALSE FALSE
##      artif. sweetener honey mustard ketchup spices soups ready soups
## [1,]            FALSE FALSE   FALSE   FALSE  FALSE FALSE        TRUE
## [2,]            FALSE FALSE   FALSE   FALSE  FALSE FALSE       FALSE
##      Instant food products sauces cereals organic products baking powder
## [1,]                 FALSE  FALSE   FALSE            FALSE         FALSE
## [2,]                 FALSE  FALSE   FALSE            FALSE         FALSE
##      preservation products pudding powder canned vegetables canned fruit
## [1,]                 FALSE          FALSE             FALSE        FALSE
## [2,]                 FALSE          FALSE             FALSE        FALSE
##      pickled vegetables specialty vegetables   jam sweet spreads
## [1,]              FALSE                FALSE FALSE         FALSE
## [2,]              FALSE                FALSE FALSE         FALSE
##      meat spreads canned fish dog food cat food pet care baby food coffee
## [1,]        FALSE       FALSE    FALSE    FALSE    FALSE     FALSE  FALSE
## [2,]        FALSE       FALSE    FALSE    FALSE    FALSE     FALSE   TRUE
##      instant coffee   tea cocoa drinks bottled water  soda misc. beverages
## [1,]          FALSE FALSE        FALSE         FALSE FALSE           FALSE
## [2,]          FALSE FALSE        FALSE         FALSE FALSE           FALSE
##      fruit/vegetable juice syrup bottled beer canned beer brandy whisky
## [1,]                 FALSE FALSE        FALSE       FALSE  FALSE  FALSE
## [2,]                 FALSE FALSE        FALSE       FALSE  FALSE  FALSE
##      liquor   rum liqueur liquor (appetizer) white wine red/blush wine
## [1,]  FALSE FALSE   FALSE              FALSE      FALSE          FALSE
## [2,]  FALSE FALSE   FALSE              FALSE      FALSE          FALSE
##      prosecco sparkling wine salty snack popcorn nut snack snack products
## [1,]    FALSE          FALSE       FALSE   FALSE     FALSE          FALSE
## [2,]    FALSE          FALSE       FALSE   FALSE     FALSE          FALSE
##      long life bakery product waffles cake bar chewing gum chocolate
## [1,]                    FALSE   FALSE    FALSE       FALSE     FALSE
## [2,]                    FALSE   FALSE    FALSE       FALSE     FALSE
##      cooking chocolate specialty chocolate specialty bar
## [1,]             FALSE               FALSE         FALSE
## [2,]             FALSE               FALSE         FALSE
##      chocolate marshmallow candy seasonal products detergent softener
## [1,]                 FALSE FALSE             FALSE     FALSE    FALSE
## [2,]                 FALSE FALSE             FALSE     FALSE    FALSE
##      decalcifier dish cleaner abrasive cleaner cleaner toilet cleaner
## [1,]       FALSE        FALSE            FALSE   FALSE          FALSE
## [2,]       FALSE        FALSE            FALSE   FALSE          FALSE
##      bathroom cleaner hair spray dental care male cosmetics
## [1,]            FALSE      FALSE       FALSE          FALSE
## [2,]            FALSE      FALSE       FALSE          FALSE
##      make up remover skin care female sanitary products baby cosmetics
## [1,]           FALSE     FALSE                    FALSE          FALSE
## [2,]           FALSE     FALSE                    FALSE          FALSE
##       soap rubbing alcohol hygiene articles napkins dishes cookware
## [1,] FALSE           FALSE            FALSE   FALSE  FALSE    FALSE
## [2,] FALSE           FALSE            FALSE   FALSE  FALSE    FALSE
##      kitchen utensil cling film/bags kitchen towels house keeping products
## [1,]           FALSE           FALSE          FALSE                  FALSE
## [2,]           FALSE           FALSE          FALSE                  FALSE
##      candles light bulbs sound storage medium newspapers photo/film
## [1,]   FALSE       FALSE                FALSE      FALSE      FALSE
## [2,]   FALSE       FALSE                FALSE      FALSE      FALSE
##      pot plants flower soil/fertilizer flower (seeds) shopping bags  bags
## [1,]      FALSE                  FALSE          FALSE         FALSE FALSE
## [2,]      FALSE                  FALSE          FALSE         FALSE FALSE

This matrix shows 2 customer’s transactions at the grocery store, if false they did not purchase, but if true they did purchase. The first customer bought citrus fruit, bread, margerine, and soup. The second bought tropical fruit, yogurt, and coffee.

itemFrequencyPlot(Groceries,topN=20,type="absolute")

This is the top twenty items that are frequenty bought in this data set. Whole milk is the most frequent item bought totaling 2500 transactions including whole milk.

rules <- apriori(Groceries, parameter = list(supp = 0.001, conf = 0.8))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.00s].
## writing ... [410 rule(s)] done [0.00s].
## creating S4 object  ... done [0.02s].
inspect(rules[1:5])
##     lhs                        rhs            support     confidence
## [1] {liquor,red/blush wine} => {bottled beer} 0.001931876 0.9047619 
## [2] {curd,cereals}          => {whole milk}   0.001016777 0.9090909 
## [3] {yogurt,cereals}        => {whole milk}   0.001728521 0.8095238 
## [4] {butter,jam}            => {whole milk}   0.001016777 0.8333333 
## [5] {soups,bottled beer}    => {whole milk}   0.001118454 0.9166667 
##     lift      count
## [1] 11.235269 19   
## [2]  3.557863 10   
## [3]  3.168192 17   
## [4]  3.261374 10   
## [5]  3.587512 11

This is a list of the top 5 rules. This tells us with the confidece level that those who by curd and cereal are 91% likely to buy whole milk. Those who buy butter and jam are 81% likely to buy whole milk.

summary(rules)
## set of 410 rules
## 
## rule length distribution (lhs + rhs):sizes
##   3   4   5   6 
##  29 229 140  12 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   4.000   4.000   4.329   5.000   6.000 
## 
## summary of quality measures:
##     support           confidence          lift            count      
##  Min.   :0.001017   Min.   :0.8000   Min.   : 3.131   Min.   :10.00  
##  1st Qu.:0.001017   1st Qu.:0.8333   1st Qu.: 3.312   1st Qu.:10.00  
##  Median :0.001220   Median :0.8462   Median : 3.588   Median :12.00  
##  Mean   :0.001247   Mean   :0.8663   Mean   : 3.951   Mean   :12.27  
##  3rd Qu.:0.001322   3rd Qu.:0.9091   3rd Qu.: 4.341   3rd Qu.:13.00  
##  Max.   :0.003152   Max.   :1.0000   Max.   :11.235   Max.   :31.00  
## 
## mining info:
##       data ntransactions support confidence
##  Groceries          9835   0.001        0.8

This shows us there are a total of 410 rules. This tell us there are 229 rules that are 4 grocery items long.

Let us take a look at rules with transactions including whole milk on the right hand side. This shows us what customers are likely to buy before buying whole milk.

rulesWholemilk<-subset(rules, subset=rhs%in%"whole milk" & lift>1.2)
inspect(sort(rulesWholemilk,by="confidence")[1:5])
##     lhs                     rhs              support confidence     lift count
## [1] {rice,                                                                    
##      sugar}              => {whole milk} 0.001220132          1 3.913649    12
## [2] {canned fish,                                                             
##      hygiene articles}   => {whole milk} 0.001118454          1 3.913649    11
## [3] {root vegetables,                                                         
##      butter,                                                                  
##      rice}               => {whole milk} 0.001016777          1 3.913649    10
## [4] {root vegetables,                                                         
##      whipped/sour cream,                                                      
##      flour}              => {whole milk} 0.001728521          1 3.913649    17
## [5] {butter,                                                                  
##      soft cheese,                                                             
##      domestic eggs}      => {whole milk} 0.001016777          1 3.913649    10

These are the top 5 rules for transaction with whole milk all of the confidences are 1. Therefore, we can conclude 100% of people will buy whole milk if they buy the items on the left hand side. The lift level is the same for all 5 rules.

We can show the rules being mapped below showing grocery items leading to also buying whole milk.

library(arulesViz)
## Loading required package: grid
rules3 <- head(sort(rulesWholemilk, by="lift"), 5)
plot(rules3, method="graph")

Now, let’s look at whole milk on the left side. This tells us what customers are likely to buy if they purchase whole milk. I have to lower the confidence interval because we did not get any above 50%.

rules<-apriori(data=Groceries, parameter=list(supp=0.001,conf = 0.15,minlen=2), 
               appearance = list(default="rhs",lhs="whole milk"),
               control = list(verbose=F))
rules<-sort(rules, decreasing=TRUE,by="confidence")
inspect(rules[1:5])
##     lhs             rhs                support    confidence lift    
## [1] {whole milk} => {other vegetables} 0.07483477 0.2928770  1.513634
## [2] {whole milk} => {rolls/buns}       0.05663447 0.2216474  1.205032
## [3] {whole milk} => {yogurt}           0.05602440 0.2192598  1.571735
## [4] {whole milk} => {root vegetables}  0.04890696 0.1914047  1.756031
## [5] {whole milk} => {tropical fruit}   0.04229792 0.1655392  1.577595
##     count
## [1] 736  
## [2] 557  
## [3] 551  
## [4] 481  
## [5] 416

This is the top 5 rules with whole milk in the left hand side with other items paired with other things. we can conclude that 29% of people who buy whole milk will also buy other vegetables. There was a total of 736 transactions that can conclude this rule.

Here is a visualization for the rules being mapped out by the lift. This shows whole milk leading to other grocery items being bought.

library(arulesViz)
rules2 <- head(sort(rules, by="lift"), 5)
plot(rules2, method="graph")