“The Apriori algorithm is used for mining frequent itemsets and devising association rules from a transactional database. The parameters “support” and “confidence” are used. Support refers to items’ frequency of occurrence; confidence is a conditional probability."
The confidence is calculated as the support of items together (i.e. Bread and milk) divided by the support of bread. So if bread occurs together with milk in 8 receipts and it appears in 10 total datasets the confidence is 0.8.
Source: https://www.educative.io/edpresso/what-is-the-apriori-algorithm
rules <- apriori(mb,
parameter = list(supp = 0.001, conf = 0.9, target = "rules"))
## Warning: Column(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
## 18, 19, 20, 21, 22, 23 not logical or factor. Applying default discretization
## (see '? discretizeDF').
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.9 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 9
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[2165 item(s), 9834 transaction(s)] done [0.01s].
## sorting and recoding items ... [738 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [32 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
summary(rules)
## set of 32 rules
##
## rule length distribution (lhs + rhs):sizes
## 2 3 4
## 1 29 2
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 3.000 3.000 3.031 3.000 4.000
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.001017 Min. :0.9000 Min. :0.001017 Min. : 13.96
## 1st Qu.:0.001093 1st Qu.:0.9286 1st Qu.:0.001119 1st Qu.: 26.70
## Median :0.001169 Median :1.0000 Median :0.001220 Median : 78.22
## Mean :0.001687 Mean :0.9695 Mean :0.001732 Mean : 78.61
## 3rd Qu.:0.001551 3rd Qu.:1.0000 3rd Qu.:0.001627 3rd Qu.:132.57
## Max. :0.010067 Max. :1.0000 Max. :0.010067 Max. :200.69
## count
## Min. :10.00
## 1st Qu.:10.75
## Median :11.50
## Mean :16.59
## 3rd Qu.:15.25
## Max. :99.00
##
## mining info:
## data ntransactions support confidence
## mb 9834 0.001 0.9
## call
## apriori(data = mb, parameter = list(supp = 0.001, conf = 0.9, target = "rules"))
We can see by the output below that the top association by lift is other vegetables and butter with whole milk with a confidence of 1 and a lift of 200.7. Lift can be calculated by dividing the confidence of two things by the support of the second. So in our bread and milk example the confidence of 80% can be divided by the support of milk so if milk occurs in 16 datasets the lift would be 80/16=5.
kable(inspect(sort(rules, by="lift")[1:10]), "simple")
## lhs rhs support confidence coverage lift count
## [1] {X6=other vegetables,
## X8=butter} => {X7=whole milk} 0.001016880 1.0000000 0.001016880 200.6939 10
## [2] {margarine=tropical fruit,
## ready soups=pip fruit,
## X6=other vegetables} => {X7=whole milk} 0.001016880 0.9090909 0.001118568 182.4490 10
## [3] {X5=onions,
## X7=whole milk} => {X6=other vegetables} 0.001118568 1.0000000 0.001118568 144.6176 11
## [4] {ready soups=root vegetables,
## X7=whole milk} => {X6=other vegetables} 0.001016880 1.0000000 0.001016880 144.6176 10
## [5] {margarine=tropical fruit,
## ready soups=pip fruit,
## X7=whole milk} => {X6=other vegetables} 0.001016880 1.0000000 0.001016880 144.6176 10
## [6] {margarine=tropical fruit,
## X7=whole milk} => {X6=other vegetables} 0.001525320 0.9375000 0.001627008 135.5790 15
## [7] {ready soups=pip fruit,
## X7=whole milk} => {X6=other vegetables} 0.001321944 0.9285714 0.001423632 134.2878 13
## [8] {ready soups=root vegetables,
## X5=onions} => {X6=other vegetables} 0.001118568 0.9166667 0.001220256 132.5662 11
## [9] {citrus fruit=frankfurter,
## X7=whole milk} => {X6=other vegetables} 0.001118568 0.9166667 0.001220256 132.5662 11
## [10] {margarine=whole milk,
## X5=curd} => {ready soups=butter} 0.001525320 1.0000000 0.001525320 109.2667 15
## Warning in kable_pipe(x, padding = padding, ...): The table should have a header
## (column names)
|| || || ||