Import data

Create associations with apriori algorithm

“The Apriori algorithm is used for mining frequent itemsets and devising association rules from a transactional database. The parameters “support” and “confidence” are used. Support refers to items’ frequency of occurrence; confidence is a conditional probability."

The confidence is calculated as the support of items together (i.e. Bread and milk) divided by the support of bread. So if bread occurs together with milk in 8 receipts and it appears in 10 total datasets the confidence is 0.8.

Source: https://www.educative.io/edpresso/what-is-the-apriori-algorithm

rules <- apriori(mb, 
    parameter = list(supp = 0.001, conf = 0.9, target = "rules"))
## Warning: Column(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
## 18, 19, 20, 21, 22, 23 not logical or factor. Applying default discretization
## (see '? discretizeDF').
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.9    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[2165 item(s), 9834 transaction(s)] done [0.01s].
## sorting and recoding items ... [738 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [32 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
summary(rules)
## set of 32 rules
## 
## rule length distribution (lhs + rhs):sizes
##  2  3  4 
##  1 29  2 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   3.000   3.000   3.031   3.000   4.000 
## 
## summary of quality measures:
##     support           confidence        coverage             lift       
##  Min.   :0.001017   Min.   :0.9000   Min.   :0.001017   Min.   : 13.96  
##  1st Qu.:0.001093   1st Qu.:0.9286   1st Qu.:0.001119   1st Qu.: 26.70  
##  Median :0.001169   Median :1.0000   Median :0.001220   Median : 78.22  
##  Mean   :0.001687   Mean   :0.9695   Mean   :0.001732   Mean   : 78.61  
##  3rd Qu.:0.001551   3rd Qu.:1.0000   3rd Qu.:0.001627   3rd Qu.:132.57  
##  Max.   :0.010067   Max.   :1.0000   Max.   :0.010067   Max.   :200.69  
##      count      
##  Min.   :10.00  
##  1st Qu.:10.75  
##  Median :11.50  
##  Mean   :16.59  
##  3rd Qu.:15.25  
##  Max.   :99.00  
## 
## mining info:
##  data ntransactions support confidence
##    mb          9834   0.001        0.9
##                                                                              call
##  apriori(data = mb, parameter = list(supp = 0.001, conf = 0.9, target = "rules"))

Inspect the top 10 associations by lift

We can see by the output below that the top association by lift is other vegetables and butter with whole milk with a confidence of 1 and a lift of 200.7. Lift can be calculated by dividing the confidence of two things by the support of the second. So in our bread and milk example the confidence of 80% can be divided by the support of milk so if milk occurs in 16 datasets the lift would be 80/16=5.

kable(inspect(sort(rules, by="lift")[1:10]), "simple")
##      lhs                               rhs                       support confidence    coverage     lift count
## [1]  {X6=other vegetables,                                                                                    
##       X8=butter}                    => {X7=whole milk}       0.001016880  1.0000000 0.001016880 200.6939    10
## [2]  {margarine=tropical fruit,                                                                               
##       ready soups=pip fruit,                                                                                  
##       X6=other vegetables}          => {X7=whole milk}       0.001016880  0.9090909 0.001118568 182.4490    10
## [3]  {X5=onions,                                                                                              
##       X7=whole milk}                => {X6=other vegetables} 0.001118568  1.0000000 0.001118568 144.6176    11
## [4]  {ready soups=root vegetables,                                                                            
##       X7=whole milk}                => {X6=other vegetables} 0.001016880  1.0000000 0.001016880 144.6176    10
## [5]  {margarine=tropical fruit,                                                                               
##       ready soups=pip fruit,                                                                                  
##       X7=whole milk}                => {X6=other vegetables} 0.001016880  1.0000000 0.001016880 144.6176    10
## [6]  {margarine=tropical fruit,                                                                               
##       X7=whole milk}                => {X6=other vegetables} 0.001525320  0.9375000 0.001627008 135.5790    15
## [7]  {ready soups=pip fruit,                                                                                  
##       X7=whole milk}                => {X6=other vegetables} 0.001321944  0.9285714 0.001423632 134.2878    13
## [8]  {ready soups=root vegetables,                                                                            
##       X5=onions}                    => {X6=other vegetables} 0.001118568  0.9166667 0.001220256 132.5662    11
## [9]  {citrus fruit=frankfurter,                                                                               
##       X7=whole milk}                => {X6=other vegetables} 0.001118568  0.9166667 0.001220256 132.5662    11
## [10] {margarine=whole milk,                                                                                   
##       X5=curd}                      => {ready soups=butter}  0.001525320  1.0000000 0.001525320 109.2667    15
## Warning in kable_pipe(x, padding = padding, ...): The table should have a header
## (column names)

|| || || ||