This document uses market basket analysis to determine the typical grouping of groceries a shopper buys when they go to the grocery store. The data used for analysis came from the groceries.csv.
library(arules)
## Warning: package 'arules' was built under R version 4.0.3
##
## Attaching package: 'arules'
## The following object is masked from 'package:dplyr':
##
## recode
## The following objects are masked from 'package:base':
##
## abbreviate, write
groceries <- read.csv("groceries.csv", header=FALSE)
groceries <- read.transactions("groceries.csv", format = 'basket', sep=',')
inspect(head(groceries))
## items
## [1] {citrus fruit,
## margarine,
## ready soups,
## semi-finished bread}
## [2] {coffee,
## tropical fruit,
## yogurt}
## [3] {whole milk}
## [4] {cream cheese,
## meat spreads,
## pip fruit,
## yogurt}
## [5] {condensed milk,
## long life bakery product,
## other vegetables,
## whole milk}
## [6] {abrasive cleaner,
## butter,
## rice,
## whole milk,
## yogurt}
summary(groceries)
## transactions as itemMatrix in sparse format with
## 9835 rows (elements/itemsets/transactions) and
## 169 columns (items) and a density of 0.02609146
##
## most frequent items:
## whole milk other vegetables rolls/buns soda
## 2513 1903 1809 1715
## yogurt (Other)
## 1372 34055
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 2159 1643 1299 1005 855 645 545 438 350 246 182 117 78 77 55 46
## 17 18 19 20 21 22 23 24 26 27 28 29 32
## 29 14 14 9 11 4 6 1 1 1 1 3 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 4.409 6.000 32.000
##
## includes extended item information - examples:
## labels
## 1 abrasive cleaner
## 2 artif. sweetener
## 3 baby cosmetics
itemFrequency(groceries, type = 'absolute')
## abrasive cleaner artif. sweetener baby cosmetics
## 35 32 6
## baby food bags baking powder
## 1 4 174
## bathroom cleaner beef berries
## 27 516 327
## beverages bottled beer bottled water
## 256 792 1087
## brandy brown bread butter
## 41 638 545
## butter milk cake bar candles
## 275 130 88
## candy canned beer canned fish
## 294 764 148
## canned fruit canned vegetables cat food
## 32 106 229
## cereals chewing gum chicken
## 56 207 422
## chocolate chocolate marshmallow citrus fruit
## 488 89 814
## cleaner cling film/bags cocoa drinks
## 50 112 22
## coffee condensed milk cooking chocolate
## 571 101 25
## cookware cream cream cheese
## 27 13 390
## curd curd cheese decalcifier
## 524 50 15
## dental care dessert detergent
## 57 365 189
## dish cleaner dishes dog food
## 103 173 84
## domestic eggs female sanitary products finished products
## 624 60 64
## fish flour flower (seeds)
## 29 171 102
## flower soil/fertilizer frankfurter frozen chicken
## 19 580 6
## frozen dessert frozen fish frozen fruits
## 106 115 12
## frozen meals frozen potato products frozen vegetables
## 279 83 473
## fruit/vegetable juice grapes hair spray
## 711 220 11
## ham hamburger meat hard cheese
## 256 327 241
## herbs honey house keeping products
## 160 15 82
## hygiene articles ice cream instant coffee
## 324 246 73
## Instant food products jam ketchup
## 79 53 42
## kitchen towels kitchen utensil light bulbs
## 59 4 41
## liqueur liquor liquor (appetizer)
## 9 109 78
## liver loaf long life bakery product make up remover
## 50 368 8
## male cosmetics margarine mayonnaise
## 45 576 90
## meat meat spreads misc. beverages
## 254 42 279
## mustard napkins newspapers
## 118 515 785
## nut snack nuts/prunes oil
## 31 33 276
## onions organic products organic sausage
## 305 16 22
## other vegetables packaged fruit/vegetables pasta
## 1903 128 148
## pastry pet care photo/film
## 875 93 91
## pickled vegetables pip fruit popcorn
## 176 744 71
## pork potato products potted plants
## 567 28 170
## preservation products processed cheese prosecco
## 2 163 20
## pudding powder ready soups red/blush wine
## 23 18 189
## rice roll products rolls/buns
## 75 101 1809
## root vegetables rubbing alcohol rum
## 1072 10 44
## salad dressing salt salty snack
## 8 106 372
## sauces sausage seasonal products
## 54 924 140
## semi-finished bread shopping bags skin care
## 174 969 35
## sliced cheese snack products soap
## 241 30 26
## soda soft cheese softener
## 1715 168 54
## sound storage medium soups sparkling wine
## 1 67 55
## specialty bar specialty cheese specialty chocolate
## 269 84 299
## specialty fat specialty vegetables spices
## 36 17 51
## spread cheese sugar sweet spreads
## 110 333 89
## syrup tea tidbits
## 32 38 23
## toilet cleaner tropical fruit turkey
## 7 1032 80
## UHT-milk vinegar waffles
## 329 64 378
## whipped/sour cream whisky white bread
## 705 8 414
## white wine whole milk yogurt
## 187 2513 1372
## zwieback
## 68
itemFrequencyPlot(groceries,topN = 20, type = "absolute",main = "Grocery Item Frequency")
grocrules <- apriori(groceries, parameter = list(supp = 0.1, conf = 0.1))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.1 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 983
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [8 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [8 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
grocrules<- sort(grocrules, by = "support", decreasing = TRUE)
inspect(grocrules)
## lhs rhs support confidence coverage lift count
## [1] {} => {whole milk} 0.2555160 0.2555160 1 1 2513
## [2] {} => {other vegetables} 0.1934926 0.1934926 1 1 1903
## [3] {} => {rolls/buns} 0.1839349 0.1839349 1 1 1809
## [4] {} => {soda} 0.1743772 0.1743772 1 1 1715
## [5] {} => {yogurt} 0.1395018 0.1395018 1 1 1372
## [6] {} => {bottled water} 0.1105236 0.1105236 1 1 1087
## [7] {} => {root vegetables} 0.1089985 0.1089985 1 1 1072
## [8] {} => {tropical fruit} 0.1049314 0.1049314 1 1 1032
grocrules<- sort(grocrules, by = "confidence", decreasing = TRUE)
inspect(grocrules)
## lhs rhs support confidence coverage lift count
## [1] {} => {whole milk} 0.2555160 0.2555160 1 1 2513
## [2] {} => {other vegetables} 0.1934926 0.1934926 1 1 1903
## [3] {} => {rolls/buns} 0.1839349 0.1839349 1 1 1809
## [4] {} => {soda} 0.1743772 0.1743772 1 1 1715
## [5] {} => {yogurt} 0.1395018 0.1395018 1 1 1372
## [6] {} => {bottled water} 0.1105236 0.1105236 1 1 1087
## [7] {} => {root vegetables} 0.1089985 0.1089985 1 1 1072
## [8] {} => {tropical fruit} 0.1049314 0.1049314 1 1 1032
grocrules<- sort(grocrules, by = "coverage", decreasing = TRUE)
inspect(grocrules)
## lhs rhs support confidence coverage lift count
## [1] {} => {whole milk} 0.2555160 0.2555160 1 1 2513
## [2] {} => {other vegetables} 0.1934926 0.1934926 1 1 1903
## [3] {} => {rolls/buns} 0.1839349 0.1839349 1 1 1809
## [4] {} => {soda} 0.1743772 0.1743772 1 1 1715
## [5] {} => {yogurt} 0.1395018 0.1395018 1 1 1372
## [6] {} => {bottled water} 0.1105236 0.1105236 1 1 1087
## [7] {} => {root vegetables} 0.1089985 0.1089985 1 1 1072
## [8] {} => {tropical fruit} 0.1049314 0.1049314 1 1 1032
grocrules<- sort(grocrules, by = "lift", decreasing = TRUE)
inspect(grocrules)
## lhs rhs support confidence coverage lift count
## [1] {} => {rolls/buns} 0.1839349 0.1839349 1 1 1809
## [2] {} => {yogurt} 0.1395018 0.1395018 1 1 1372
## [3] {} => {whole milk} 0.2555160 0.2555160 1 1 2513
## [4] {} => {other vegetables} 0.1934926 0.1934926 1 1 1903
## [5] {} => {soda} 0.1743772 0.1743772 1 1 1715
## [6] {} => {bottled water} 0.1105236 0.1105236 1 1 1087
## [7] {} => {root vegetables} 0.1089985 0.1089985 1 1 1072
## [8] {} => {tropical fruit} 0.1049314 0.1049314 1 1 1032
grocrules<- sort(grocrules, by = "count", decreasing = TRUE)
inspect(grocrules)
## lhs rhs support confidence coverage lift count
## [1] {} => {whole milk} 0.2555160 0.2555160 1 1 2513
## [2] {} => {other vegetables} 0.1934926 0.1934926 1 1 1903
## [3] {} => {rolls/buns} 0.1839349 0.1839349 1 1 1809
## [4] {} => {soda} 0.1743772 0.1743772 1 1 1715
## [5] {} => {yogurt} 0.1395018 0.1395018 1 1 1372
## [6] {} => {bottled water} 0.1105236 0.1105236 1 1 1087
## [7] {} => {root vegetables} 0.1089985 0.1089985 1 1 1072
## [8] {} => {tropical fruit} 0.1049314 0.1049314 1 1 1032
rules.mw<-apriori(data=groceries, parameter = list(supp=0.001,conf=0.08),
appearance = list(default="lhs", rhs = "whole milk"), control = list(verbose = F))
rules.mw.byconf<- sort(rules.mw, by = "confidence", decreasing = TRUE)
inspect(head(rules.mw.byconf))
## lhs rhs support confidence coverage lift count
## [1] {rice,
## sugar} => {whole milk} 0.001220132 1 0.001220132 3.913649 12
## [2] {canned fish,
## hygiene articles} => {whole milk} 0.001118454 1 0.001118454 3.913649 11
## [3] {butter,
## rice,
## root vegetables} => {whole milk} 0.001016777 1 0.001016777 3.913649 10
## [4] {flour,
## root vegetables,
## whipped/sour cream} => {whole milk} 0.001728521 1 0.001728521 3.913649 17
## [5] {butter,
## domestic eggs,
## soft cheese} => {whole milk} 0.001016777 1 0.001016777 3.913649 10
## [6] {butter,
## hygiene articles,
## pip fruit} => {whole milk} 0.001016777 1 0.001016777 3.913649 10
grocrules.mw<-apriori(data=groceries, parameter=list(supp=0.001,conf = 0.08),
appearance=list(default="rhs",lhs="whole milk"), control=list(verbose=F))
grocrules.mw.byconf<-sort(rules.mw, by="support", decreasing=FALSE)
inspect(head(grocrules.mw.byconf))
## lhs rhs support confidence
## [1] {sparkling wine} => {whole milk} 0.001016777 0.1818182
## [2] {liver loaf,yogurt} => {whole milk} 0.001016777 0.6666667
## [3] {curd cheese,rolls/buns} => {whole milk} 0.001016777 0.6250000
## [4] {cleaner,other vegetables} => {whole milk} 0.001016777 0.6250000
## [5] {cereals,curd} => {whole milk} 0.001016777 0.9090909
## [6] {cereals,root vegetables} => {whole milk} 0.001016777 0.7692308
## coverage lift count
## [1] 0.005592272 0.7115726 10
## [2] 0.001525165 2.6090994 10
## [3] 0.001626843 2.4460306 10
## [4] 0.001626843 2.4460306 10
## [5] 0.001118454 3.5578628 10
## [6] 0.001321810 3.0104993 10
Upon analysis, it was shown that there was total of 9,835 shoppers and that they had a selection of 169 items. In particular, it was shown that a majority of shoppers bought whole milk when they went to the grocery store. It was also shown that shoppers were less likely to buy items such as baby food, kitchen utensils,and preservation products.This was shown in the item frequency list, which showed that these items were being bought by a shopper less frequently. It was also shown that a total of 8 rules was used to determine support, confidence,coverage, lift, and count. It was shown that the LHS and RHS order for support, confidence, coverage, and count was identical, but was different for lift.