This article is a part of the Unsupervised Learning course at the Faculty of Economic Sciences, University of Warsaw.

Purpose

In this short paper, I will investigate sales patterns across the transactions set from “The Bread Basket” bakery located in Edinburgh making use of association rules with apriori algorithm, hierarchical rules and visualisations. The dataset is coming from Kaggle https://www.kaggle.com/mittalvasu95/the-bread-basket

Initial Data Analysis

As one can see on chart 1 below, We have got a lot of various items (94) and many of them have unclear names. Therefore in the next step, we convert all the lunch names into the lunch, cookies names into the “cookies” and a few smaller things. As we can see in Chart 2, after transformation, the products list is smaller and the distribution is more spread.

Bakery <- read_csv("Bakery.csv")
Bakery <- data.frame(lapply(Bakery, function(x) {gsub("'", "", x)}))
ggplot(Bakery, aes(x = reorder(Items, Items, function(x)-length(x)))) +
  geom_bar()+
  labs(x = "Items", y = "transactions count") +
  theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) +
  ggtitle("Chart 1. Raw products count")

dict <- read_delim("dict.csv", delim = ";", 
                   escape_double = FALSE, trim_ws = TRUE)

Bakery <- merge(x = Bakery, y = dict, by.x = "Items", by.y = "Var1", all.x = TRUE)



ggplot(Bakery, aes(x = reorder(Item, Item, function(x)-length(x)))) +
  geom_bar()+
  labs(x = "Items", y = "transactions count") +
  theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) +
  ggtitle("Chart 2. Modified products count")

Reading Transactions

As we can see in the transactions’ list head and Chart 3, our dataset contains rather transactions with a small number of products. Almost half of the transactions contain just one item. As we calculate below, only 25,36% of the transactions consist of more than 2 products. Therefore, we expect rather rules with a small number of products.

transactions <- read.transactions("Bakery2.csv", format = "single", sep = ",", header = TRUE, cols = c("TransactionNo", "Items"))

inspect(head(transactions))
##     items                     transactionID
## [1] {Bread}                   1            
## [2] {Brunch, Croissant}       10           
## [3] {Bread}                   100          
## [4] {Brunch, Chimichurri Oil} 1000         
## [5] {Bread, Truffles}         1001         
## [6] {Brunch, Cake}            1002
rules_size <- as.data.frame(size(transactions))
ggplot(rules_size, aes(x = size(transactions) )) + geom_histogram(binwidth = 1,color="white", fill="dark grey") + ggtitle("Chart 3. Number of Items in tranactions")

# calculating rate of transactions with more than 2 products
sum(rules_size$`size(transactions)` > 2)/nrow(rules_size)
## [1] 0.2535658

Creating rules

Considering the high number of products and lack of transactions with many products, we limit the support variable by only 0.01 (lhs and rhs products must appear together in at least 1% of transactions) and confidence (rate of cross-selling transactions across lhs products bought) by 0.4 obtaining 15 rules

rules.trans1<-apriori(transactions, parameter=list(supp=0.01, conf=0.4))  
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.4    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 94 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[58 item(s), 9465 transaction(s)] done [0.00s].
## sorting and recoding items ... [22 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [15 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules.by.lift<-sort(rules.trans1, by="lift", decreasing=TRUE)

inspect(rules.by.lift)
##      lhs                rhs      support    confidence coverage   lift     
## [1]  {Toast}         => {Coffee} 0.02366614 0.7044025  0.03359746 1.4724315
## [2]  {Croissant}     => {Coffee} 0.03518225 0.5692308  0.06180666 1.1898784
## [3]  {Cake, Cookies} => {Coffee} 0.01616482 0.5543478  0.02916006 1.1587681
## [4]  {Pastry}        => {Coffee} 0.04754358 0.5521472  0.08610671 1.1541682
## [5]  {Sandwich}      => {Coffee} 0.03824617 0.5323529  0.07184363 1.1127916
## [6]  {Juice}         => {Coffee} 0.02176440 0.5255102  0.04141574 1.0984881
## [7]  {Cookies}       => {Coffee} 0.07120972 0.5204633  0.13681986 1.0879385
## [8]  {Cake}          => {Coffee} 0.08166931 0.5184440  0.15752773 1.0837174
## [9]  {Hot chocolate} => {Coffee} 0.02958267 0.5072464  0.05832013 1.0603107
## [10] {Muffin}        => {Coffee} 0.01880613 0.4890110  0.03845747 1.0221928
## [11] {}              => {Coffee} 0.47839408 0.4783941  1.00000000 1.0000000
## [12] {Soup}          => {Coffee} 0.01584786 0.4601227  0.03444268 0.9618068
## [13] {Bread, Cake}   => {Coffee} 0.01605917 0.4141689  0.03877443 0.8657485
## [14] {Cookies, Tea}  => {Coffee} 0.01098785 0.4062500  0.02704702 0.8491953
## [15] {Cake, Tea}     => {Coffee} 0.01352351 0.4025157  0.03359746 0.8413894
##      count
## [1]   224 
## [2]   333 
## [3]   153 
## [4]   450 
## [5]   362 
## [6]   206 
## [7]   674 
## [8]   773 
## [9]   280 
## [10]  178 
## [11] 4528 
## [12]  150 
## [13]  152 
## [14]  104 
## [15]  128
plot(rules.trans1, method="graph",  shading="lift")

As we can see in the chart above, Coffee is a major product and is likely to be bought as an addition to other products like pastries and sweets. We can also spot bundles of Cakes, Cookies and Tea with coffee. Worth mention is also a rule of a single coffee, showing enormous support of 0.4784, which means that 47,84% of all transactions were buying only coffee. Lift, the measure that could be interpreted of how more likely the products are sold together than separately is an interesting statistic worth exploring because it shows bundling potential and actually in our case it is highly correlated with Confidence. Based on those two lists, we can spot potential promotion snack bundles - Toast plus Coffee and Croissant plus Coffee. Interesting thing is that Starbucks is actually offering those two bundles at a promotional price.

Hierarchical Rules

Introducing hierarchical rules means grouping particular products into categories, which is especially important in that case. We again use the dictionary provided earlier in which we group all the products into 6 categories: Hot Drinks, Cold Drinks, Pastries, Sweets, Lunch food and Other

itemInfo(transactions) <- merge(x = itemInfo(transactions), y = unique(dict[,c(2:3)]), by.x = "labels", by.y = "Item")
transactions_level2 <- aggregate(transactions, by="Category")
inspect(head(transactions_level2))
##     items                transactionID
## [1] {Pastries}           1            
## [2] {Lunch food, Sweets} 10           
## [3] {Pastries}           100          
## [4] {Lunch food, Other}  1000         
## [5] {Pastries, Sweets}   1001         
## [6] {Lunch food, Sweets} 1002
rules.trans1_lev2<-apriori(transactions_level2, parameter=list(supp=0.01, conf=0.4))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.4    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 94 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[6 item(s), 9465 transaction(s)] done [0.00s].
## sorting and recoding items ... [6 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [18 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules.by.conf<-sort(rules.trans1_lev2, by="confidence", decreasing=TRUE) 
inspect(head(rules.by.conf, 10))
##      lhs                               rhs          support    confidence
## [1]  {Lunch food, Other}            => {Hot Drinks} 0.01077655 0.7183099 
## [2]  {Cold Drinks, Sweets}          => {Hot Drinks} 0.02577919 0.6873239 
## [3]  {Lunch food, Sweets}           => {Hot Drinks} 0.04183835 0.6491803 
## [4]  {Sweets}                       => {Hot Drinks} 0.23634443 0.6267862 
## [5]  {Lunch food}                   => {Hot Drinks} 0.13143159 0.6050584 
## [6]  {}                             => {Hot Drinks} 0.59387216 0.5938722 
## [7]  {Cold Drinks, Hot Drinks}      => {Sweets}     0.02577919 0.5823389 
## [8]  {Lunch food, Pastries, Sweets} => {Hot Drinks} 0.01162176 0.5820106 
## [9]  {Cold Drinks, Pastries}        => {Hot Drinks} 0.01109350 0.5801105 
## [10] {Cold Drinks, Pastries}        => {Sweets}     0.01098785 0.5745856 
##      coverage   lift      count
## [1]  0.01500264 1.2095362  102 
## [2]  0.03750660 1.1573601  244 
## [3]  0.06444797 1.0931314  396 
## [4]  0.37707343 1.0554228 2237 
## [5]  0.21722134 1.0188360 1244 
## [6]  1.00000000 1.0000000 5621 
## [7]  0.04426836 1.5443647  244 
## [8]  0.01996830 0.9800267  110 
## [9]  0.01912309 0.9768272  105 
## [10] 0.01912309 1.5238030  104
plot(rules.trans1_lev2, method="graph", cex=0.7, shading="lift")
## Available control parameters (with default values):
## layout    =  stress
## circular  =  FALSE
## ggraphdots    =  NULL
## edges     =  <environment>
## nodes     =  <environment>
## nodetext  =  <environment>
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## engine    =  ggplot2
## max   =  100
## verbose   =  FALSE

As we can see, the rules now consist of more elements and provided chart presents more complex patterns, although the central part is again Hot drinks (driven mostly by Coffee). Although now it’s more likely to bundle with such products as Cold Drinks and Sweets, Lunch and Sweets or Lunch and Sweets separately. The rule with the highest lift consists of another category, which was derived as a classification of smaller products and those that were tough to identify, therefore it is not a natural bucket and shouldn’t be interpreted. Offering hot beverages as an addition to lunch is also common practice across various restaurants.

Conclusions

In the conducted analysis we have spotted rules, that can be observed in real-world examples, such as bundling coffee with croissant, sandwich or lunch. Unfortunately, due to custom names of products, derived categories might not be fully accurate and for more reliable interpretation, the official descriptions of products should be fetched. Also, further analysis could consist of grouping transactions by the weekday/weekend and time of the day. That could also help propose offers for a particular time of the day.