library(tidyverse)
library(arules)
library(arulesViz)
library(kableExtra)
mb <- read.transactions('C:\\Users\\pgood\\OneDrive\\Documents\\DATA624\\GroceryDataSet1.csv', sep = ',')
rules <- apriori(mb, parameter=list(support=0.001, confidence=0.5))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 9
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.01s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.01s].
## writing ... [5668 rule(s)] done [0.00s].
## creating S4 object ... done [0.01s].
summary(mb)
## transactions as itemMatrix in sparse format with
## 9835 rows (elements/itemsets/transactions) and
## 169 columns (items) and a density of 0.02609146
##
## most frequent items:
## whole milk other vegetables rolls/buns soda
## 2513 1903 1809 1715
## yogurt (Other)
## 1372 34055
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## 2159 1643 1299 1005 855 645 545 438 350 246 182 117 78 77 55
## 16 17 18 19 20 21 22 23 24 26 27 28 29 32
## 46 29 14 14 9 11 4 6 1 1 1 1 3 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 4.409 6.000 32.000
##
## includes extended item information - examples:
## labels
## 1 abrasive cleaner
## 2 artif. sweetener
## 3 baby cosmetics
In these groupings, “Whole Milk” seems to be invoved the most. It is the most frequently purchases item, but it’s involvement in association rules is more than would be expected from independent distributions.
my_rules <- inspect(head(rules, n = 10, by ="support")) %>%
kable () %>%
kable_styling(bootstrap_options = 'striped')
## lhs rhs support confidence lift count
## [1] {other vegetables,
## yogurt} => {whole milk} 0.02226741 0.5128806 2.007235 219
## [2] {tropical fruit,
## yogurt} => {whole milk} 0.01514997 0.5173611 2.024770 149
## [3] {other vegetables,
## whipped/sour cream} => {whole milk} 0.01464159 0.5070423 1.984385 144
## [4] {root vegetables,
## yogurt} => {whole milk} 0.01453991 0.5629921 2.203354 143
## [5] {other vegetables,
## pip fruit} => {whole milk} 0.01352313 0.5175097 2.025351 133
## [6] {root vegetables,
## yogurt} => {other vegetables} 0.01291307 0.5000000 2.584078 127
## [7] {rolls/buns,
## root vegetables} => {whole milk} 0.01270971 0.5230126 2.046888 125
## [8] {domestic eggs,
## other vegetables} => {whole milk} 0.01230300 0.5525114 2.162336 121
## [9] {root vegetables,
## tropical fruit} => {other vegetables} 0.01230300 0.5845411 3.020999 121
## [10] {rolls/buns,
## root vegetables} => {other vegetables} 0.01220132 0.5020921 2.594890 120
There appears to be some value in this group of rules associations. The lift is at least 2 (rounded) in all 10 sets of rules, and given the count, this wouldn’t seem to be random chance. The theme of this set seems to be healthy eating, making it possible to use these rules to construct promotions or set the store up in a manner to facilitate these purchases.
inspect(head(rules, n = 10, by ="confidence")) %>%
kable () %>%
kable_styling(bootstrap_options = 'striped')
## lhs rhs support confidence lift count
## [1] {rice,
## sugar} => {whole milk} 0.001220132 1 3.913649 12
## [2] {canned fish,
## hygiene articles} => {whole milk} 0.001118454 1 3.913649 11
## [3] {butter,
## rice,
## root vegetables} => {whole milk} 0.001016777 1 3.913649 10
## [4] {flour,
## root vegetables,
## whipped/sour cream} => {whole milk} 0.001728521 1 3.913649 17
## [5] {butter,
## domestic eggs,
## soft cheese} => {whole milk} 0.001016777 1 3.913649 10
## [6] {citrus fruit,
## root vegetables,
## soft cheese} => {other vegetables} 0.001016777 1 5.168156 10
## [7] {butter,
## hygiene articles,
## pip fruit} => {whole milk} 0.001016777 1 3.913649 10
## [8] {hygiene articles,
## root vegetables,
## whipped/sour cream} => {whole milk} 0.001016777 1 3.913649 10
## [9] {hygiene articles,
## pip fruit,
## root vegetables} => {whole milk} 0.001016777 1 3.913649 10
## [10] {cream cheese,
## domestic eggs,
## sugar} => {whole milk} 0.001118454 1 3.913649 11
As a result of the confidence being 1, the lift is identical in all of these rules involving “Whole Milk”" as the consequent. These aren’t particularly valuable, given the low support, other than letting us know “Whole Milk” is a swiss army knife.
inspect(head(rules, n = 10, by ="lift")) %>%
kable () %>%
kable_styling(bootstrap_options = 'striped')
## lhs rhs support confidence lift count
## [1] {Instant food products,
## soda} => {hamburger meat} 0.001220132 0.6315789 18.99565 12
## [2] {popcorn,
## soda} => {salty snack} 0.001220132 0.6315789 16.69779 12
## [3] {baking powder,
## flour} => {sugar} 0.001016777 0.5555556 16.40807 10
## [4] {ham,
## processed cheese} => {white bread} 0.001931876 0.6333333 15.04549 19
## [5] {Instant food products,
## whole milk} => {hamburger meat} 0.001525165 0.5000000 15.03823 15
## [6] {curd,
## other vegetables,
## whipped/sour cream,
## yogurt} => {cream cheese} 0.001016777 0.5882353 14.83409 10
## [7] {domestic eggs,
## processed cheese} => {white bread} 0.001118454 0.5238095 12.44364 11
## [8] {other vegetables,
## tropical fruit,
## white bread,
## yogurt} => {butter} 0.001016777 0.6666667 12.03058 10
## [9] {hamburger meat,
## whipped/sour cream,
## yogurt} => {butter} 0.001016777 0.6250000 11.27867 10
## [10] {domestic eggs,
## other vegetables,
## tropical fruit,
## whole milk,
## yogurt} => {butter} 0.001016777 0.6250000 11.27867 10
Most of these association rules have an intuitive explination: snack runs, sandwhich materials, breakfast foods. A store owner could browse these for non-obvious rules with a nice expination to make business decisions, for example: the fifth row could be explained by quick meals.
Sifting through rules associations for interesting interactions can often be done more effectively with graphs.
Network Plot
We look for rules with support greater than .002 and lift greater than 5 and plot the network of these rules.
new_rules <- rules[quality(rules)$support > .002 & quality(rules)$lift > 5]
inspect(new_rules) %>%
kable () %>%
kable_styling(bootstrap_options = 'striped')
## lhs rhs support confidence lift count
## [1] {other vegetables,
## rice} => {root vegetables} 0.002236909 0.5641026 5.175325 22
## [2] {herbs,
## yogurt} => {root vegetables} 0.002033554 0.5714286 5.242537 20
## [3] {grapes,
## pip fruit} => {tropical fruit} 0.002135231 0.5675676 5.408941 21
## [4] {butter,
## hard cheese} => {whipped/sour cream} 0.002033554 0.5128205 7.154028 20
## [5] {herbs,
## other vegetables,
## whole milk} => {root vegetables} 0.002440264 0.6000000 5.504664 24
## [6] {grapes,
## other vegetables,
## whole milk} => {tropical fruit} 0.002033554 0.5263158 5.015810 20
## [7] {citrus fruit,
## frozen vegetables,
## other vegetables} => {root vegetables} 0.002033554 0.6250000 5.734025 20
## [8] {beef,
## butter,
## whole milk} => {root vegetables} 0.002033554 0.5555556 5.096911 20
## [9] {beef,
## citrus fruit,
## other vegetables} => {root vegetables} 0.002135231 0.6363636 5.838280 21
## [10] {beef,
## citrus fruit,
## whole milk} => {root vegetables} 0.002236909 0.5641026 5.175325 22
## [11] {beef,
## other vegetables,
## tropical fruit} => {root vegetables} 0.002745297 0.6136364 5.629770 27
## [12] {beef,
## tropical fruit,
## whole milk} => {root vegetables} 0.002541942 0.5555556 5.096911 25
## [13] {beef,
## other vegetables,
## soda} => {root vegetables} 0.002033554 0.5714286 5.242537 20
## [14] {bottled water,
## root vegetables,
## yogurt} => {tropical fruit} 0.002236909 0.5789474 5.517391 22
## [15] {butter,
## other vegetables,
## whole milk,
## yogurt} => {tropical fruit} 0.002338587 0.5348837 5.097463 23
## [16] {citrus fruit,
## other vegetables,
## tropical fruit,
## whole milk} => {root vegetables} 0.003152008 0.6326531 5.804238 31
## [17] {citrus fruit,
## other vegetables,
## root vegetables,
## whole milk} => {tropical fruit} 0.003152008 0.5438596 5.183004 31
plot(new_rules, method = "graph")
The core of the network is difficult ot follow, but it gives us an idea of what products are associated with many different items are which are more insular.
Matrix Plot
The matrix plot allows us to visualize more rules simultaneosly, but offers less information regarding each rule. Here, we relax the support requirement to .015.
new_rules <- rules[quality(rules)$support > .0015 & quality(rules)$lift > 5]
plot(new_rules, method = "matrix", measure = "lift")
## Itemsets in Antecedent (LHS)
## [1] "{ham,processed cheese}"
## [2] "{Instant food products,whole milk}"
## [3] "{liquor,red/blush wine}"
## [4] "{citrus fruit,cream cheese,whole milk}"
## [5] "{flour,root vegetables,whole milk}"
## [6] "{ham,other vegetables,tropical fruit}"
## [7] "{butter,hard cheese}"
## [8] "{frozen meals,pip fruit,whole milk}"
## [9] "{rice,yogurt}"
## [10] "{other vegetables,rice,whole milk}"
## [11] "{flour,whipped/sour cream,whole milk}"
## [12] "{oil,other vegetables,tropical fruit}"
## [13] "{bottled water,root vegetables,whole milk,yogurt}"
## [14] "{fruit/vegetable juice,root vegetables,tropical fruit}"
## [15] "{beef,other vegetables,tropical fruit,whole milk}"
## [16] "{herbs,tropical fruit,whole milk}"
## [17] "{butter,onions,other vegetables}"
## [18] "{citrus fruit,oil,other vegetables}"
## [19] "{ham,other vegetables,pip fruit}"
## [20] "{beef,citrus fruit,other vegetables}"
## [21] "{citrus fruit,other vegetables,tropical fruit,whole milk}"
## [22] "{citrus fruit,other vegetables,root vegetables,yogurt}"
## [23] "{cream cheese,curd,other vegetables}"
## [24] "{beef,rolls/buns,tropical fruit}"
## [25] "{herbs,rolls/buns,whole milk}"
## [26] "{citrus fruit,frozen vegetables,other vegetables}"
## [27] "{root vegetables,turkey}"
## [28] "{domestic eggs,root vegetables,whole milk,yogurt}"
## [29] "{butter,curd,tropical fruit}"
## [30] "{pastry,rolls/buns,tropical fruit,whole milk}"
## [31] "{beef,other vegetables,tropical fruit}"
## [32] "{herbs,tropical fruit}"
## [33] "{bottled water,root vegetables,yogurt}"
## [34] "{herbs,rolls/buns}"
## [35] "{herbs,other vegetables,whole milk}"
## [36] "{other vegetables,processed cheese,whole milk}"
## [37] "{oil,tropical fruit,whole milk}"
## [38] "{fruit/vegetable juice,other vegetables,pork}"
## [39] "{grapes,pip fruit}"
## [40] "{ham,other vegetables,yogurt}"
## [41] "{other vegetables,sliced cheese,yogurt}"
## [42] "{butter,root vegetables,whole milk,yogurt}"
## [43] "{tropical fruit,turkey}"
## [44] "{herbs,yogurt}"
## [45] "{beef,other vegetables,sausage}"
## [46] "{beef,other vegetables,soda}"
## [47] "{citrus fruit,other vegetables,pip fruit,whole milk}"
## [48] "{butter,onions,whole milk}"
## [49] "{beef,sausage,whole milk}"
## [50] "{citrus fruit,other vegetables,root vegetables,whole milk}"
## [51] "{other vegetables,rice}"
## [52] "{beef,citrus fruit,whole milk}"
## [53] "{soft cheese,tropical fruit,whole milk}"
## [54] "{cream cheese,root vegetables,whipped/sour cream}"
## [55] "{margarine,tropical fruit,whipped/sour cream}"
## [56] "{root vegetables,sausage,whipped/sour cream}"
## [57] "{rolls/buns,tropical fruit,whipped/sour cream,whole milk}"
## [58] "{butter,other vegetables,whole milk,yogurt}"
## [59] "{beef,butter,yogurt}"
## [60] "{beef,butter,whole milk}"
## [61] "{beef,tropical fruit,whole milk}"
## [62] "{hygiene articles,pip fruit,whole milk}"
## [63] "{bottled water,whipped/sour cream,yogurt}"
## [64] "{butter,root vegetables,tropical fruit,whole milk}"
## [65] "{canned beer,rolls/buns,soda}"
## [66] "{citrus fruit,herbs}"
## [67] "{citrus fruit,pork,whole milk}"
## [68] "{coffee,tropical fruit,whole milk}"
## [69] "{citrus fruit,other vegetables,tropical fruit,yogurt}"
## [70] "{grapes,other vegetables,whole milk}"
## Itemsets in Consequent (RHS)
## [1] "{shopping bags}" "{yogurt}" "{tropical fruit}"
## [4] "{root vegetables}" "{citrus fruit}" "{whipped/sour cream}"
## [7] "{pip fruit}" "{domestic eggs}" "{bottled beer}"
## [10] "{hamburger meat}" "{white bread}"