This data set contains 30 days of point-of-sale transaction data from a typical local grocery outlet. This data set contains 9835 transactions and 169 items. We use Market Basket Analysis to predict what someone is most likely to buy based on the items bought.
library("arules")
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
data(Groceries)
summary(Groceries)
## transactions as itemMatrix in sparse format with
## 9835 rows (elements/itemsets/transactions) and
## 169 columns (items) and a density of 0.02609146
##
## most frequent items:
## whole milk other vegetables rolls/buns soda
## 2513 1903 1809 1715
## yogurt (Other)
## 1372 34055
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## 2159 1643 1299 1005 855 645 545 438 350 246 182 117 78 77 55
## 16 17 18 19 20 21 22 23 24 26 27 28 29 32
## 46 29 14 14 9 11 4 6 1 1 1 1 3 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 4.409 6.000 32.000
##
## includes extended item information - examples:
## labels level2 level1
## 1 frankfurter sausage meat and sausage
## 2 sausage sausage meat and sausage
## 3 liver loaf sausage meat and sausage
There are 9,835 rows which are transactions. There are 169 columns which are different items bought. The most frequent items bought are whole milk, vegetables, rolls/buns, soda, and yogurt.
aa=as(Groceries,"matrix") #transforms transaction matrix into incidence matrix
aa[1:2,]
## frankfurter sausage liver loaf ham meat finished products
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE
## organic sausage chicken turkey pork beef hamburger meat fish
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## citrus fruit tropical fruit pip fruit grapes berries nuts/prunes
## [1,] TRUE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE TRUE FALSE FALSE FALSE FALSE
## root vegetables onions herbs other vegetables
## [1,] FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE
## packaged fruit/vegetables whole milk butter curd dessert butter milk
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE
## yogurt whipped/sour cream beverages UHT-milk condensed milk cream
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] TRUE FALSE FALSE FALSE FALSE FALSE
## soft cheese sliced cheese hard cheese cream cheese processed cheese
## [1,] FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE
## spread cheese curd cheese specialty cheese mayonnaise salad dressing
## [1,] FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE
## tidbits frozen vegetables frozen fruits frozen meals frozen fish
## [1,] FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE
## frozen chicken ice cream frozen dessert frozen potato products
## [1,] FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE
## domestic eggs rolls/buns white bread brown bread pastry
## [1,] FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE
## roll products semi-finished bread zwieback potato products flour
## [1,] FALSE TRUE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE
## salt rice pasta vinegar oil margarine specialty fat sugar
## [1,] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## artif. sweetener honey mustard ketchup spices soups ready soups
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Instant food products sauces cereals organic products baking powder
## [1,] FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE
## preservation products pudding powder canned vegetables canned fruit
## [1,] FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE
## pickled vegetables specialty vegetables jam sweet spreads
## [1,] FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE
## meat spreads canned fish dog food cat food pet care baby food coffee
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## instant coffee tea cocoa drinks bottled water soda misc. beverages
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE
## fruit/vegetable juice syrup bottled beer canned beer brandy whisky
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE
## liquor rum liqueur liquor (appetizer) white wine red/blush wine
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE
## prosecco sparkling wine salty snack popcorn nut snack snack products
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE
## long life bakery product waffles cake bar chewing gum chocolate
## [1,] FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE
## cooking chocolate specialty chocolate specialty bar
## [1,] FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE
## chocolate marshmallow candy seasonal products detergent softener
## [1,] FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE
## decalcifier dish cleaner abrasive cleaner cleaner toilet cleaner
## [1,] FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE
## bathroom cleaner hair spray dental care male cosmetics
## [1,] FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE
## make up remover skin care female sanitary products baby cosmetics
## [1,] FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE
## soap rubbing alcohol hygiene articles napkins dishes cookware
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE
## kitchen utensil cling film/bags kitchen towels house keeping products
## [1,] FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE
## candles light bulbs sound storage medium newspapers photo/film
## [1,] FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE
## pot plants flower soil/fertilizer flower (seeds) shopping bags bags
## [1,] FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE
This matrix shows 2 customer’s transactions at the grocery store, if false they did not purchase, but if true they did purchase. The first customer bought citrus fruit, bread, margerine, and soup. The second bought tropical fruit, yogurt, and coffee.
itemFrequencyPlot(Groceries,topN=20,type="absolute")
This is the top twenty items that are frequenty bought in this data set. Whole milk is the most frequent item bought totaling 2500 transactions including whole milk.
rules <- apriori(Groceries, parameter = list(supp = 0.001, conf = 0.8))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 9
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.00s].
## writing ... [410 rule(s)] done [0.00s].
## creating S4 object ... done [0.02s].
inspect(rules[1:5])
## lhs rhs support confidence
## [1] {liquor,red/blush wine} => {bottled beer} 0.001931876 0.9047619
## [2] {curd,cereals} => {whole milk} 0.001016777 0.9090909
## [3] {yogurt,cereals} => {whole milk} 0.001728521 0.8095238
## [4] {butter,jam} => {whole milk} 0.001016777 0.8333333
## [5] {soups,bottled beer} => {whole milk} 0.001118454 0.9166667
## lift count
## [1] 11.235269 19
## [2] 3.557863 10
## [3] 3.168192 17
## [4] 3.261374 10
## [5] 3.587512 11
This is a list of the top 5 rules. This tells us with the confidece level that those who by curd and cereal are 91% likely to buy whole milk. Those who buy butter and jam are 81% likely to buy whole milk.
summary(rules)
## set of 410 rules
##
## rule length distribution (lhs + rhs):sizes
## 3 4 5 6
## 29 229 140 12
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 4.000 4.000 4.329 5.000 6.000
##
## summary of quality measures:
## support confidence lift count
## Min. :0.001017 Min. :0.8000 Min. : 3.131 Min. :10.00
## 1st Qu.:0.001017 1st Qu.:0.8333 1st Qu.: 3.312 1st Qu.:10.00
## Median :0.001220 Median :0.8462 Median : 3.588 Median :12.00
## Mean :0.001247 Mean :0.8663 Mean : 3.951 Mean :12.27
## 3rd Qu.:0.001322 3rd Qu.:0.9091 3rd Qu.: 4.341 3rd Qu.:13.00
## Max. :0.003152 Max. :1.0000 Max. :11.235 Max. :31.00
##
## mining info:
## data ntransactions support confidence
## Groceries 9835 0.001 0.8
This shows us there are a total of 410 rules. This tell us there are 229 rules that are 4 grocery items long.
Let us take a look at rules with transactions including whole milk on the right hand side. This shows us what customers are likely to buy before buying whole milk.
rulesWholemilk<-subset(rules, subset=rhs%in%"whole milk" & lift>1.2)
inspect(sort(rulesWholemilk,by="confidence")[1:5])
## lhs rhs support confidence lift count
## [1] {rice,
## sugar} => {whole milk} 0.001220132 1 3.913649 12
## [2] {canned fish,
## hygiene articles} => {whole milk} 0.001118454 1 3.913649 11
## [3] {root vegetables,
## butter,
## rice} => {whole milk} 0.001016777 1 3.913649 10
## [4] {root vegetables,
## whipped/sour cream,
## flour} => {whole milk} 0.001728521 1 3.913649 17
## [5] {butter,
## soft cheese,
## domestic eggs} => {whole milk} 0.001016777 1 3.913649 10
These are the top 5 rules for transaction with whole milk all of the confidences are 1. Therefore, we can conclude 100% of people will buy whole milk if they buy the items on the left hand side. The lift level is the same for all 5 rules.
We can show the rules being mapped below showing grocery items leading to also buying whole milk.
library(arulesViz)
## Loading required package: grid
rules3 <- head(sort(rulesWholemilk, by="lift"), 5)
plot(rules3, method="graph")
Now, let’s look at whole milk on the left side. This tells us what customers are likely to buy if they purchase whole milk. I have to lower the confidence interval because we did not get any above 50%.
rules<-apriori(data=Groceries, parameter=list(supp=0.001,conf = 0.15,minlen=2),
appearance = list(default="rhs",lhs="whole milk"),
control = list(verbose=F))
rules<-sort(rules, decreasing=TRUE,by="confidence")
inspect(rules[1:5])
## lhs rhs support confidence lift
## [1] {whole milk} => {other vegetables} 0.07483477 0.2928770 1.513634
## [2] {whole milk} => {rolls/buns} 0.05663447 0.2216474 1.205032
## [3] {whole milk} => {yogurt} 0.05602440 0.2192598 1.571735
## [4] {whole milk} => {root vegetables} 0.04890696 0.1914047 1.756031
## [5] {whole milk} => {tropical fruit} 0.04229792 0.1655392 1.577595
## count
## [1] 736
## [2] 557
## [3] 551
## [4] 481
## [5] 416
This is the top 5 rules with whole milk in the left hand side with other items paired with other things. we can conclude that 29% of people who buy whole milk will also buy other vegetables. There was a total of 736 transactions that can conclude this rule.
Here is a visualization for the rules being mapped out by the lift. This shows whole milk leading to other grocery items being bought.
library(arulesViz)
rules2 <- head(sort(rules, by="lift"), 5)
plot(rules2, method="graph")