Introduction

To discover patterns within the client’s behavior, many companies and small enterprises research associations between purchased products. It helps in recognizing which products are bought more frequently than in others. This knowledge may contribute to a significant rise in profits simply by preparing promotions or packages to sell particular products jointly—even the store’s place, where the particular products are significant in clients’ decision-making process. The field of Data Science, which covers these activities, is usually called Association Rule Mining, and rules according to which clients decide to buy some products are called Association Rules.

In this paper, using apriori and eclat algorithms, I will analyze association rules and discover patterns, which occur in the examined dataset. I will consider rules regarding two frequently bought products - classic croissants and americano to enrich my analysis.

Examined Dataset

Our dataset comes from the kaggle, and it presents delivery data from some bakery store in Korea (https://www.kaggle.com/hosubjeong/bakery-sales). Data were gathered from 11 July 2019 till 18 June 2020. Due to lack of observations, we obtained 2420 observations.

bakery<-read.transactions("bakerysales1.csv", format="basket", sep=";", skip=0)
bakery=bakery[1:2420]
inspect(head(bakery))

##     items                                                
## [1] {americano,angbutter,tiramisu.croissant,vanila.latte}
## [2] {angbutter,orange.pound,tiramisu.croissant}          
## [3] {tiramisu.croissant}                                 
## [4] {angbutter,plain.bread,vanila.latte}                 
## [5] {angbutter,tiramisu.croissant}                       
## [6] {angbutter,milk.tea,vanila.latte}

Frequency

Now let’s examine the most frequently bought products among the available ones.

itemFrequency(bakery,type="absolute")

##   almond.croissant          americano          angbutter          berry.ade 
##                202                412               1973                 54 
##         cacao.deep        caffe.latte        cheese.cake          croissant 
##                323                193                 90                747 
##    gateau.chocolat                jam           lemonade   merinque.cookies 
##                196                220                 35                 47 
##           milk.tea       orange.pound   pain.au.chocolat            pandoro 
##                137                519                587                343 
##        plain.bread           tiramisu tiramisu.croissant       vanila.latte 
##                857                  7                779                209 
##             wiener 
##                355

Type = Relative

itemFrequencyPlot(bakery, topN=12, type="relative", main="Item Frequency", col="purple")

Type = Absolute

itemFrequencyPlot(bakery, topN=12, type="absolute", main="Item Frequency", col="purple")

As we can see from the both plots, angbutter (Pretzel filled with red beans and gourmet butter) had been bought in the biggest number of orders. The next ones are plain bread, and croissants (classic and tiramisu). Among beverages, coffe americano was the most popular. It had been ordered 412 times.

Measures in association rules

In order to asses the power of the rule, we can use Support and Confidence measure. Support gives us the answer, how many times a particular rule is applicable for a given dataset. Confidence informs us about the reliability of the interference made by the rule. We can also look as confidence as a conditional probability of B given A. (https://www-users.cs.umn.edu/~kumar001/dmbook/ch6.pdf). In simpler words, Support is the ratio between the observations in which X and Y were ordered together to the total number of orders. Confidence is the probability of buying of buying B under condition that we already have A in our basket.

Another two usually used metrcis are Expected confidence and Lift. Expected confidence is the probability of occurrence of the antecedent, if it was independent. Lift is the ratio between confidence and expected confidence. (https://pub.towardsai.net/association-discovery-the-apriori-algorithm-28c1e71e0f04)

Eclat Algorithm

Our next is to find the frequently bought basket (itemsets). To perform this, we will use Eclat. It is an algorithm that digs into a dataset and finds the most frequent itemsets. It does not create the rules. Together with itemsets, we obtain the measure (usually support) of each itemset. I will set min. supp 0.15.

freq.items<-eclat(bakery, parameter=list(supp=0.15, maxlen=15))

## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE    0.15      1     15 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 363 
## 
## create itemset ... 
## set transactions ...[21 item(s), 2420 transaction(s)] done [0.00s].
## sorting and recoding items ... [7 item(s)] done [0.00s].
## creating bit matrix ... [7 row(s), 2420 column(s)] done [0.00s].
## writing  ... [12 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].

inspect(freq.items)

##      items                          support   transIdenticalToItemsets count
## [1]  {angbutter,orange.pound}       0.1677686  406                      406 
## [2]  {angbutter,pain.au.chocolat}   0.1818182  440                      440 
## [3]  {angbutter,tiramisu.croissant} 0.2483471  601                      601 
## [4]  {angbutter,croissant}          0.2305785  558                      558 
## [5]  {angbutter,plain.bread}        0.2677686  648                      648 
## [6]  {angbutter}                    0.8152893 1973                     1973 
## [7]  {plain.bread}                  0.3541322  857                      857 
## [8]  {croissant}                    0.3086777  747                      747 
## [9]  {tiramisu.croissant}           0.3219008  779                      779 
## [10] {pain.au.chocolat}             0.2425620  587                      587 
## [11] {orange.pound}                 0.2144628  519                      519 
## [12] {americano}                    0.1702479  412                      412

According to the obtained results, angbutter, plain bread, and tiramisu croissant are the most frequently ordered products from this bakery. When it comes to itemsets with more than one product, angbutter is jointed with the biggest number of orders.

Apriori Algorithm

The apriori aim is the same as eclat - it looks for the most frequent itemsets in the database, but additionally, it creates association rules for the itemsets. These rules inform about relations between items.

Now let’s create these rules, taking into consideration all of the itemsets. This algorithm also requires assuming entry values into our function. After examing the dataset I took support of min. 0.1 and confidence level of 0.2.

rb<-apriori(bakery, parameter=list(supp=0.1, conf=0.2))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.2    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 242 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[21 item(s), 2420 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [24 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

inspect(rb)

##      lhs                     rhs                  support   confidence
## [1]  {}                   => {orange.pound}       0.2144628 0.2144628 
## [2]  {}                   => {pain.au.chocolat}   0.2425620 0.2425620 
## [3]  {}                   => {tiramisu.croissant} 0.3219008 0.3219008 
## [4]  {}                   => {croissant}          0.3086777 0.3086777 
## [5]  {}                   => {plain.bread}        0.3541322 0.3541322 
## [6]  {}                   => {angbutter}          0.8152893 0.8152893 
## [7]  {cacao.deep}         => {angbutter}          0.1012397 0.7585139 
## [8]  {pandoro}            => {angbutter}          0.1090909 0.7696793 
## [9]  {wiener}             => {angbutter}          0.1103306 0.7521127 
## [10] {americano}          => {angbutter}          0.1347107 0.7912621 
## [11] {orange.pound}       => {angbutter}          0.1677686 0.7822736 
## [12] {angbutter}          => {orange.pound}       0.1677686 0.2057780 
## [13] {pain.au.chocolat}   => {angbutter}          0.1818182 0.7495741 
## [14] {angbutter}          => {pain.au.chocolat}   0.1818182 0.2230106 
## [15] {tiramisu.croissant} => {plain.bread}        0.1016529 0.3157895 
## [16] {plain.bread}        => {tiramisu.croissant} 0.1016529 0.2870478 
## [17] {tiramisu.croissant} => {angbutter}          0.2483471 0.7715019 
## [18] {angbutter}          => {tiramisu.croissant} 0.2483471 0.3046123 
## [19] {croissant}          => {plain.bread}        0.1136364 0.3681392 
## [20] {plain.bread}        => {croissant}          0.1136364 0.3208868 
## [21] {croissant}          => {angbutter}          0.2305785 0.7469880 
## [22] {angbutter}          => {croissant}          0.2305785 0.2828180 
## [23] {plain.bread}        => {angbutter}          0.2677686 0.7561260 
## [24] {angbutter}          => {plain.bread}        0.2677686 0.3284339 
##      coverage  lift      count
## [1]  1.0000000 1.0000000  519 
## [2]  1.0000000 1.0000000  587 
## [3]  1.0000000 1.0000000  779 
## [4]  1.0000000 1.0000000  747 
## [5]  1.0000000 1.0000000  857 
## [6]  1.0000000 1.0000000 1973 
## [7]  0.1334711 0.9303617  245 
## [8]  0.1417355 0.9440567  264 
## [9]  0.1466942 0.9225102  267 
## [10] 0.1702479 0.9705293  326 
## [11] 0.2144628 0.9595044  406 
## [12] 0.8152893 0.9595044  406 
## [13] 0.2425620 0.9193965  440 
## [14] 0.8152893 0.9193965  440 
## [15] 0.3219008 0.8917276  246 
## [16] 0.3541322 0.8917276  246 
## [17] 0.3219008 0.9462923  601 
## [18] 0.8152893 0.9462923  601 
## [19] 0.3086777 1.0395530  275 
## [20] 0.3541322 1.0395530  275 
## [21] 0.3086777 0.9162245  558 
## [22] 0.8152893 0.9162245  558 
## [23] 0.3541322 0.9274328  648 
## [24] 0.8152893 0.9274328  648

The rules from angle of different measures look as follows

Confidence

rc<-sort(rb, by="confidence", decreasing=TRUE)
inspect(head(rc))

##     lhs                     rhs         support   confidence coverage 
## [1] {}                   => {angbutter} 0.8152893 0.8152893  1.0000000
## [2] {americano}          => {angbutter} 0.1347107 0.7912621  0.1702479
## [3] {orange.pound}       => {angbutter} 0.1677686 0.7822736  0.2144628
## [4] {tiramisu.croissant} => {angbutter} 0.2483471 0.7715019  0.3219008
## [5] {pandoro}            => {angbutter} 0.1090909 0.7696793  0.1417355
## [6] {cacao.deep}         => {angbutter} 0.1012397 0.7585139  0.1334711
##     lift      count
## [1] 1.0000000 1973 
## [2] 0.9705293  326 
## [3] 0.9595044  406 
## [4] 0.9462923  601 
## [5] 0.9440567  264 
## [6] 0.9303617  245

Lift

rl<-sort(rb, by="lift", decreasing=TRUE) 
inspect(head(rl))

##     lhs              rhs                  support   confidence coverage 
## [1] {croissant}   => {plain.bread}        0.1136364 0.3681392  0.3086777
## [2] {plain.bread} => {croissant}          0.1136364 0.3208868  0.3541322
## [3] {}            => {orange.pound}       0.2144628 0.2144628  1.0000000
## [4] {}            => {pain.au.chocolat}   0.2425620 0.2425620  1.0000000
## [5] {}            => {tiramisu.croissant} 0.3219008 0.3219008  1.0000000
## [6] {}            => {croissant}          0.3086777 0.3086777  1.0000000
##     lift     count
## [1] 1.039553 275  
## [2] 1.039553 275  
## [3] 1.000000 519  
## [4] 1.000000 587  
## [5] 1.000000 779  
## [6] 1.000000 747

plot(rb, method="matrix", measure="lift")

## Itemsets in Antecedent (LHS)
##  [1] "{}"                   "{croissant}"          "{americano}"         
##  [4] "{orange.pound}"       "{plain.bread}"        "{pandoro}"           
##  [7] "{angbutter}"          "{cacao.deep}"         "{wiener}"            
## [10] "{pain.au.chocolat}"   "{tiramisu.croissant}"
## Itemsets in Consequent (RHS)
## [1] "{angbutter}"          "{tiramisu.croissant}" "{pain.au.chocolat}"  
## [4] "{plain.bread}"        "{orange.pound}"       "{croissant}"

Support

rs<-sort(rb, by="support", decreasing=TRUE) 
inspect(head(rs))

##     lhs              rhs                  support   confidence coverage 
## [1] {}            => {angbutter}          0.8152893 0.8152893  1.0000000
## [2] {}            => {plain.bread}        0.3541322 0.3541322  1.0000000
## [3] {}            => {tiramisu.croissant} 0.3219008 0.3219008  1.0000000
## [4] {}            => {croissant}          0.3086777 0.3086777  1.0000000
## [5] {plain.bread} => {angbutter}          0.2677686 0.7561260  0.3541322
## [6] {angbutter}   => {plain.bread}        0.2677686 0.3284339  0.8152893
##     lift      count
## [1] 1.0000000 1973 
## [2] 1.0000000  857 
## [3] 1.0000000  779 
## [4] 1.0000000  747 
## [5] 0.9274328  648 
## [6] 0.9274328  648

Rules may be also shown as a plots. Below we may find some of them

plot(rb, method="grouped")

plot(rb, method="graph", control=list(type="items"))

## Warning: Unknown control parameters: type

## Available control parameters (with default values):
## main  =  Graph for 24 rules
## nodeColors    =  c("#66CC6680", "#9999CC80")
## nodeCol   =  c("#EE0000FF", "#EE0303FF", "#EE0606FF", "#EE0909FF", "#EE0C0CFF", "#EE0F0FFF", "#EE1212FF", "#EE1515FF", "#EE1818FF", "#EE1B1BFF", "#EE1E1EFF", "#EE2222FF", "#EE2525FF", "#EE2828FF", "#EE2B2BFF", "#EE2E2EFF", "#EE3131FF", "#EE3434FF", "#EE3737FF", "#EE3A3AFF", "#EE3D3DFF", "#EE4040FF", "#EE4444FF", "#EE4747FF", "#EE4A4AFF", "#EE4D4DFF", "#EE5050FF", "#EE5353FF", "#EE5656FF", "#EE5959FF", "#EE5C5CFF", "#EE5F5FFF", "#EE6262FF", "#EE6666FF", "#EE6969FF", "#EE6C6CFF", "#EE6F6FFF", "#EE7272FF", "#EE7575FF",  "#EE7878FF", "#EE7B7BFF", "#EE7E7EFF", "#EE8181FF", "#EE8484FF", "#EE8888FF", "#EE8B8BFF", "#EE8E8EFF", "#EE9191FF", "#EE9494FF", "#EE9797FF", "#EE9999FF", "#EE9B9BFF", "#EE9D9DFF", "#EE9F9FFF", "#EEA0A0FF", "#EEA2A2FF", "#EEA4A4FF", "#EEA5A5FF", "#EEA7A7FF", "#EEA9A9FF", "#EEABABFF", "#EEACACFF", "#EEAEAEFF", "#EEB0B0FF", "#EEB1B1FF", "#EEB3B3FF", "#EEB5B5FF", "#EEB7B7FF", "#EEB8B8FF", "#EEBABAFF", "#EEBCBCFF", "#EEBDBDFF", "#EEBFBFFF", "#EEC1C1FF", "#EEC3C3FF", "#EEC4C4FF", "#EEC6C6FF", "#EEC8C8FF",  "#EEC9C9FF", "#EECBCBFF", "#EECDCDFF", "#EECFCFFF", "#EED0D0FF", "#EED2D2FF", "#EED4D4FF", "#EED5D5FF", "#EED7D7FF", "#EED9D9FF", "#EEDBDBFF", "#EEDCDCFF", "#EEDEDEFF", "#EEE0E0FF", "#EEE1E1FF", "#EEE3E3FF", "#EEE5E5FF", "#EEE7E7FF", "#EEE8E8FF", "#EEEAEAFF", "#EEECECFF", "#EEEEEEFF")
## edgeCol   =  c("#474747FF", "#494949FF", "#4B4B4BFF", "#4D4D4DFF", "#4F4F4FFF", "#515151FF", "#535353FF", "#555555FF", "#575757FF", "#595959FF", "#5B5B5BFF", "#5E5E5EFF", "#606060FF", "#626262FF", "#646464FF", "#666666FF", "#686868FF", "#6A6A6AFF", "#6C6C6CFF", "#6E6E6EFF", "#707070FF", "#727272FF", "#747474FF", "#767676FF", "#787878FF", "#7A7A7AFF", "#7C7C7CFF", "#7E7E7EFF", "#808080FF", "#828282FF", "#848484FF", "#868686FF", "#888888FF", "#8A8A8AFF", "#8C8C8CFF", "#8D8D8DFF", "#8F8F8FFF", "#919191FF", "#939393FF",  "#959595FF", "#979797FF", "#999999FF", "#9A9A9AFF", "#9C9C9CFF", "#9E9E9EFF", "#A0A0A0FF", "#A2A2A2FF", "#A3A3A3FF", "#A5A5A5FF", "#A7A7A7FF", "#A9A9A9FF", "#AAAAAAFF", "#ACACACFF", "#AEAEAEFF", "#AFAFAFFF", "#B1B1B1FF", "#B3B3B3FF", "#B4B4B4FF", "#B6B6B6FF", "#B7B7B7FF", "#B9B9B9FF", "#BBBBBBFF", "#BCBCBCFF", "#BEBEBEFF", "#BFBFBFFF", "#C1C1C1FF", "#C2C2C2FF", "#C3C3C4FF", "#C5C5C5FF", "#C6C6C6FF", "#C8C8C8FF", "#C9C9C9FF", "#CACACAFF", "#CCCCCCFF", "#CDCDCDFF", "#CECECEFF", "#CFCFCFFF", "#D1D1D1FF",  "#D2D2D2FF", "#D3D3D3FF", "#D4D4D4FF", "#D5D5D5FF", "#D6D6D6FF", "#D7D7D7FF", "#D8D8D8FF", "#D9D9D9FF", "#DADADAFF", "#DBDBDBFF", "#DCDCDCFF", "#DDDDDDFF", "#DEDEDEFF", "#DEDEDEFF", "#DFDFDFFF", "#E0E0E0FF", "#E0E0E0FF", "#E1E1E1FF", "#E1E1E1FF", "#E2E2E2FF", "#E2E2E2FF", "#E2E2E2FF")
## alpha     =  0.5
## cex   =  1
## itemLabels    =  TRUE
## labelCol  =  #000000B3
## measureLabels     =  FALSE
## precision     =  3
## layout    =  NULL
## layoutParams  =  list()
## arrowSize     =  0.5
## engine    =  igraph
## plot  =  TRUE
## plot_options  =  list()
## max   =  100
## verbose   =  FALSE

plot(rb, method="paracoord", control=list(reorder=TRUE))

As expected, the items that occur most often in our dataset have the highest given measures. The most significant item is angbutter, which is the member of the most frequent orders amongst given data by confidence.

According to support, the products bought the most were under the condition that nothing was bought had the biggest values. Of course, angbutter remains the item in two distinguished itemsets, containing more than one product.

By looking at lift, we see the two strongest relations are between plain bread and croissant. It is rather obvious since lift is a symmetrical measure (the same value for X given Y and Y given X). The lift of 1.03 means that these two products are 1.03 times more likely to be purchased together than comparing to purchases when they are assumed to be unrelated (class materials from Unsupervised Learning).

Two dig more dipper into rules. I will examine rules for two particular products - classic croissant and americano. The first one is one of the most purchased products, and americano is the most popular beverage. We should not be surprised since sugar snacks are often sold in a package with a hot beverage.

Croissants

Measure by which I will asses the power of the rule is confidence.

Croissant as Consequent

rc<-apriori(data=bakery, parameter=list(supp=0.001,conf = 0.2), 
                      appearance=list(default="lhs", rhs="croissant"), control=list(verbose=F)) 
rcb<-sort(rc, by="confidence", decreasing=TRUE)
inspect(head(rcb))

##     lhs                     rhs             support confidence    coverage     lift count
## [1] {orange.pound,                                                                       
##      pandoro,                                                                            
##      plain.bread,                                                                        
##      wiener}             => {croissant} 0.001239669       1.00 0.001239669 3.239625     3
## [2] {angbutter,                                                                          
##      orange.pound,                                                                       
##      pandoro,                                                                            
##      plain.bread,                                                                        
##      wiener}             => {croissant} 0.001239669       1.00 0.001239669 3.239625     3
## [3] {almond.croissant,                                                                   
##      angbutter,                                                                          
##      jam,                                                                                
##      pain.au.chocolat}   => {croissant} 0.001652893       0.80 0.002066116 2.591700     4
## [4] {almond.croissant,                                                                   
##      angbutter,                                                                          
##      pandoro,                                                                            
##      plain.bread}        => {croissant} 0.001652893       0.80 0.002066116 2.591700     4
## [5] {almond.croissant,                                                                   
##      angbutter,                                                                          
##      pain.au.chocolat,                                                                   
##      tiramisu.croissant} => {croissant} 0.001652893       0.80 0.002066116 2.591700     4
## [6] {almond.croissant,                                                                   
##      cheese.cake,                                                                        
##      pain.au.chocolat}   => {croissant} 0.001239669       0.75 0.001652893 2.429719     3

Croissant as Antecedent

rc<-apriori(data=bakery, parameter=list(supp=0.001,conf = 0.2), 
                    appearance=list(default="rhs", lhs="croissant"), control=list(verbose=F))
rcb<-sort(rc, by="confidence", decreasing=TRUE)
inspect(head(rcb))

##     lhs            rhs                  support    confidence coverage 
## [1] {}          => {angbutter}          0.81528926 0.8152893  1.0000000
## [2] {croissant} => {angbutter}          0.23057851 0.7469880  0.3086777
## [3] {croissant} => {plain.bread}        0.11363636 0.3681392  0.3086777
## [4] {}          => {plain.bread}        0.35413223 0.3541322  1.0000000
## [5] {}          => {tiramisu.croissant} 0.32190083 0.3219008  1.0000000
## [6] {croissant} => {pain.au.chocolat}   0.09586777 0.3105756  0.3086777
##     lift      count
## [1] 1.0000000 1973 
## [2] 0.9162245  558 
## [3] 1.0395530  275 
## [4] 1.0000000  857 
## [5] 1.0000000  779 
## [6] 1.2803970  232

As we see, there is much more orders, which assumes buying croissants in the first place. Moreover, croissant in this case is a part of much complex orders, in which clients purchased also different type of croissants like almond or tiramisu. The biggest confidence value was reached for basket with orange.pound, pandoro and wiener.

Taking croissant as Antecedent, the strongest relations are with most popular items (plain bread and angbutter), and also pain.au.chocolat.

Americano

Americano as Consequent

americano<-apriori(data=bakery, parameter=list(supp=0.001,conf = 0.2), 
                   appearance=list(default="lhs", rhs="americano"), control=list(verbose=F))
ra<-sort(americano, by="confidence", decreasing=TRUE)
inspect(head(ra))

##     lhs                     rhs             support confidence    coverage     lift count
## [1] {berry.ade,                                                                          
##      caffe.latte,                                                                        
##      tiramisu.croissant} => {americano} 0.001239669  1.0000000 0.001239669 5.873786     3
## [2] {angbutter,                                                                          
##      berry.ade,                                                                          
##      caffe.latte,                                                                        
##      tiramisu.croissant} => {americano} 0.001239669  1.0000000 0.001239669 5.873786     3
## [3] {angbutter,                                                                          
##      berry.ade,                                                                          
##      caffe.latte}        => {americano} 0.002066116  0.8333333 0.002479339 4.894822     5
## [4] {berry.ade,                                                                          
##      caffe.latte}        => {americano} 0.002479339  0.6666667 0.003719008 3.915858     6
## [5] {lemonade,                                                                           
##      orange.pound}       => {americano} 0.001239669  0.6000000 0.002066116 3.524272     3
## [6] {berry.ade,                                                                          
##      cacao.deep}         => {americano} 0.001239669  0.6000000 0.002066116 3.524272     3

Croissant as Antecedent

americano1<-apriori(data=bakery, parameter=list(supp=0.001,conf = 0.2), 
                    appearance=list(default="rhs", lhs="americano"), control=list(verbose=F))
ra1<-sort(americano1, by="confidence", decreasing=TRUE)
inspect(head(ra1))

##     lhs            rhs                  support    confidence coverage 
## [1] {}          => {angbutter}          0.81528926 0.8152893  1.0000000
## [2] {americano} => {angbutter}          0.13471074 0.7912621  0.1702479
## [3] {}          => {plain.bread}        0.35413223 0.3541322  1.0000000
## [4] {}          => {tiramisu.croissant} 0.32190083 0.3219008  1.0000000
## [5] {}          => {croissant}          0.30867769 0.3086777  1.0000000
## [6] {americano} => {plain.bread}        0.04710744 0.2766990  0.1702479
##     lift      count
## [1] 1.0000000 1973 
## [2] 0.9705293  326 
## [3] 1.0000000  857 
## [4] 1.0000000  779 
## [5] 1.0000000  747 
## [6] 0.7813438  114

Americano as Consequent is bought together with other drinks, especially berry ade and caffe latte. When it comes to sweet snacks, tiramisu croissant and angbutter are the ones after which clients ordered americano.

If they decided to buy americano in the first place, the most frequently bought snack was angbutter. Another product along which americano was bought was plain bread.

#Conclusions

In this short paper I examined two association rules algorithms - eclast and apriori on the dataset of bakery sales. From the analysis I pointed out products (items) that occurred the biggest number of times, and due to their values of mostly used measures, they could explain a majority of behavior patterns in sold orders. Aligning bakery’s offer to the results may contribute to achieving higher revenue for delivery orders.

References

https://www.kaggle.com/hosubjeong/bakery-sales https://pub.towardsai.net/association-discovery-the-apriori-algorithm-28c1e71e0f04 https://select-statistics.co.uk/blog/market-basket-analysis-understanding-customer-behaviour/ Class Materials on Unsupervised Learning, Faculty of Economic Sciences, University of Warsaw

Association rules analysis with bakery sales data

Daniel Kornacki, student of the Faculty of Economic Sciences, University of Warsaw

28 02 2021

Introduction

Examined Dataset

Frequency

Type = Relative

Type = Absolute

Measures in association rules

Eclat Algorithm

Apriori Algorithm

Confidence

Lift

Support

Croissants

Croissant as Consequent

Croissant as Antecedent

Americano

Americano as Consequent

Croissant as Antecedent

References