The main aim of this report is to analyze relationships between ingredients in various recipes. Association Rule is a data method for identifying all associations and correlations between attribute values. Association rules are used in data science to discover correlations and co-occurrences between data sets. They are best suited for explaining patterns in data from seemingly unrelated information repositories, such as relational and transactional databases. The use of association rules is sometimes referred to as “association rule usage.”
[https://www.sciencedirect.com/topics/computer-science/association-rules]
I decided to explore almost 50.000 recipes in order to investigate association rules between ingredients. Data comes from Cookteau Database. Database consists of 45 772 recipes and 1033 ingredients. Firstly, I transformed the database downloaded from the website in such a way that each row of the database contains all the ingredients needed for a given recipe. Each ingredient is in a separate column.
[https://cookteau.com/en/home-2/https://cookteau.com/en/home-2/].
trans<-read.csv("recipes.csv", sep=";", header=TRUE)
trans <- data.frame(trans)
trans2 <- trans[sample(nrow(trans), 10), ]
trans2 %>%
kbl() %>%
kable_styling()
Item1 | Item2 | Item3 | Item4 | Item5 | Item6 | Item7 | Item8 | Item9 | Item10 | Item11 | Item11.1 | Item12 | Item13 | Item14 | Item15 | Item16 | Item17 | Item18 | Item20 | Item20.1 | Item21 | Item22 | Item23 | Item24 | Item25 | Item26 | Item27 | Item28 | Item29 | Item30 | Item31 | Item32 | Item33 | Item34 | Item35 | Item35.1 | Item36 | Item37 | Item38 | Item39 | Item40 | Item41 | Item42 | Item43 | Item44 | Item45 | Item46 | Item47 | Item48 | Item49 | Item50 | Item51 | Item52 | Item53 | Item54 | Item55 | Item56 | Item57 | Item58 | Item59 | Item60 | Item61 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13272 | chicken | cinnamon | cornstarch | cumin | garlic | olive | onion | pepper bell | pineapple | salt | tomato | vegetable oil | water | ||||||||||||||||||||||||||||||||||||||||||||||||||
1548 | buttermilk | cayenne | chickpea | egg | ginger garlic paste | pomegranate | potato | spinach | sunflower | turmeric | |||||||||||||||||||||||||||||||||||||||||||||||||||||
40338 | egg | eggplant | garlic | nutmeg | olive | pepper | pepper | ricotta cheese | salt | salt | tomato | water | |||||||||||||||||||||||||||||||||||||||||||||||||||
5994 | bay leaf | celery | chicken | chicken | garlic | onion | oregano | pepper bell | rice | salt | sausage | sausage | shrimp | tomato | turmeric | ||||||||||||||||||||||||||||||||||||||||||||||||
36563 | butter | chive | horseradish | olive | steak | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4910 | bamboo shoot | cake | cake | carrot | chicken | daikon | salt | shiitake | shrimp | soy sauce | spinach | water | |||||||||||||||||||||||||||||||||||||||||||||||||||
15663 | asparagu | basil | butter | cheese parmesan | garlic | olive | oregano | pepper | pepper | pepper bell | shallot | shrimp | |||||||||||||||||||||||||||||||||||||||||||||||||||
15696 | chicken | garlic | ginger | onion | soy sauce | sugar | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
42938 | garlic | peanut oil | pepper | salt | string bean | turkey | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
18504 | celery | celery | egg | garlic | mayonnaise | mustard | onion | pepper | potato | relish |
nrow(trans)
[1] 45749
ncol(trans)
[1] 63
trans1<-read.transactions("recipes.csv", format="basket", sep=";", skip=0)
The most frequent items are listed below. As expected, the most common ingredients are spices: salt, pepper. Also, onion, garlic and olive were pretty popular in recipes. The most complex recipes contain even 60 ingredients. Moreover, only 278 out of 45750 recipes have 1 ingredient. Mean and Median are similar and equal = 9 ingredients per recipe. Additionally, analyziong quartiles, 75% recipes have less than 11 ingredients. Density value is equal to 0.009802723, which refers to the proportion of non-zero matrix cells.
summary(trans1)
## transactions as itemMatrix in sparse format with
## 45750 rows (elements/itemsets/transactions) and
## 925 columns (items) and a density of 0.009802723
##
## most frequent items:
## salt pepper garlic onion olive (Other)
## 18745 16065 15995 13877 13476 336681
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 278 814 1565 2439 3438 4299 4699 4893 4657 4060 3513 2808 2225 1618 1276 947
## 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
## 656 501 337 213 177 102 92 45 26 20 8 10 12 3 6 6
## 33 34 37 38 60
## 3 1 1 1 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 6.000 9.000 9.068 11.000 60.000
##
## includes extended item information - examples:
## labels
## 1 abalone
## 2 adobo
## 3 adobo sauce
Below on the R printout, there are sample of 10 first recipes and ingredients stored inside them. They contain many ingridients per one recipe. However, the number of items vary. For example in the first basket there are 4 ingredients: capsicum, pepper bell, soy sauce, sunflower, and in the seventh we can see 8 ingredients: asafoetida, cayenne, chickpea, fennel, fenugreek, mustard oil, nigella seed, turmeric.
inspect(trans1[2:11])
## items
## [1] {capsicum,
## pepper bell,
## soy sauce,
## sunflower}
## [2] {buttermilk,
## cumin,
## fenugreek,
## ginger garlic paste,
## mustard oil,
## nigella seed,
## pepper bell,
## potato,
## sunflower}
## [3] {asafoetida,
## cayenne,
## fenugreek,
## ginger garlic paste,
## mustard oil,
## sesame,
## sunflower,
## turmeric}
## [4] {butter,
## cardamom,
## cashew,
## cayenne,
## cinnamon,
## clove,
## coriander,
## corn grit,
## cumin,
## potato,
## raisin,
## sunflower,
## tomato,
## turmeric}
## [5] {curry leaf,
## lemon,
## sunflower}
## [6] {coriander,
## cumin,
## mint,
## mustard oil,
## pepper bell,
## potato,
## sunflower,
## turmeric}
## [7] {asafoetida,
## cayenne,
## chickpea,
## fennel,
## fenugreek,
## mustard oil,
## nigella seed,
## turmeric}
## [8] {anise,
## asafoetida,
## cayenne,
## fenugreek,
## mango,
## mustard oil,
## nigella seed,
## sunflower,
## turmeric}
## [9] {cayenne,
## coriander,
## fennel,
## mango,
## nigella seed,
## sunflower,
## turmeric}
## [10] {buttermilk,
## mint}
# itemFrequencyPlot(trans1, support = 0.1)
We are able to check what products are most common in the recipes dataset. In addition to the ingredients already mentioned, sugar, butter, egg, water, flour, tomato, chicken, cream, parsley, vegatble oil are also often present. It is worth noting that all the items have minumum support = 10%.
itemFrequencyPlot(trans1, topN=15, type="absolute", main="Item Frequency")
itemFrequencyPlot(trans1, topN=15, type="relative", main="Item Frequency")
It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. [https://en.wikipedia.org/wiki/Apriori_algorithm]
The first step is to compute the support of each individual item. After, we need to decide on the support threshold. The support of an association rule is the percentage of groups that contain all of the items listed in that association rule. The percentage value is calculated from among all the groups that were considered. This percentage value shows how often the joined rule body and rule head occur among all of the groups that were considered. The support of a rule is the occurence of the number of groups containing all the items that appear in the rule diveded by the total number of all the groups that are considered.
[https://www.ibm.com/docs/en/db2/9.7?topic=associations-support-in-association-rule]
There is an additional mesasure called confidence. The confidence tells you a percentage of cases in which this rule is valid. 100% confidence means that this association always occurs; 50% for example means that the rule only holds 50% of the time.
[https://towardsdatascience.com/the-apriori-algorithm-5da3db9aea95]
Once we have obtained the rules, the last step is to compute the lift of each rule. According to the definition, the lift of a rule is a performance metric that indicates the strength of the association between the products in the rule. This means that lift basically compares the improvement of an association rule against the overall dataset.
[https://towardsdatascience.com/the-apriori-algorithm-5da3db9aea95[]
Apriori implements the Apriori algorithm. It starts with a minimum support of 100% of the data items and decreases this in steps of 5% until there are at least 10 rules with the required minimum confidence of 0.9 or until the support has reached a lower bound of 10%, whichever occurs first. (These default values can be changed.)
[https://www.sciencedirect.com/topics/computer-science/minimum-confidence]
Rules have to refer to at least two products and fulfill minimum values of support and confidence level. Firstly, the Apriori algorithm has been used with values (minimum support = 0.1, minimum confidence = 0.6). For these values, there are only 8 rules detected. However, I will analyze this case, as well as the assotation rules for confidence = 0.55.
rules1b<-apriori(trans1, parameter=list(supp=0.10, conf=0.60))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 4575
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[925 item(s), 45750 transaction(s)] done [0.07s].
## sorting and recoding items ... [17 item(s)] done [0.00s].
## creating transaction tree ... done [0.03s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [8 rule(s)] done [0.00s].
## creating S4 object ... done [0.01s].
plot(rules1b, measure=c("support","lift"), shading="confidence", main="Morning transactions rules")
plot(rules1b)
plot(rules1b, method="grouped")
plot(rules1b, method="graph", measure="support", shading="lift", main="Graf dla 8 reguł")
## Available control parameters (with default values):
## layout = stress
## circular = FALSE
## ggraphdots = NULL
## edges = <environment>
## nodes = <environment>
## nodetext = <environment>
## colors = c("#EE0000FF", "#EEEEEEFF")
## engine = ggplot2
## max = 100
## verbose = FALSE
Another very useful option is to sort the rules by the support, confidence and lift levels. The highest support value is equal to almost 18%. It can be understood as that pair: {olive -> garlic} occurs in 18% of all recipes (8142 times). The highest confidence value achieved is 67% for {flour -> salt}. It tells us that if someone use flour, he will also use butter with 67% probability. The second highest confidence suggests that if someone use garlic or salt, he will also use pepper with probability 66%. Lastly, the highest lift value = 2.22 is recorder in rule {flour -> butter}. Interpreting second and forth rules, garlic would be more likely used if we used onion and pepper, rather than olive and pepper.
rules.by.supp<-sort(rules1b, by="support", decreasing=TRUE)
inspect(rules.by.supp)
## lhs rhs support confidence coverage lift count
## [1] {olive} => {garlic} 0.1779672 0.6041852 0.2945574 1.728132 8142
## [2] {flour} => {salt} 0.1270164 0.6705516 0.1894208 1.636582 5811
## [3] {flour} => {butter} 0.1162623 0.6137780 0.1894208 2.218738 5319
## [4] {garlic, salt} => {pepper} 0.1108415 0.6697926 0.1654863 1.907439 5071
## [5] {olive, pepper} => {garlic} 0.1094645 0.6539566 0.1673880 1.870492 5008
## [6] {garlic, olive} => {pepper} 0.1094645 0.6150823 0.1779672 1.751635 5008
## [7] {tomato} => {garlic} 0.1057705 0.6330455 0.1670820 1.810680 4839
## [8] {onion, pepper} => {garlic} 0.1043716 0.6675521 0.1563497 1.909378 4775
rules.by.conf<-sort(rules1b, by="confidence", decreasing=TRUE)
inspect(rules.by.conf)
## lhs rhs support confidence coverage lift count
## [1] {flour} => {salt} 0.1270164 0.6705516 0.1894208 1.636582 5811
## [2] {garlic, salt} => {pepper} 0.1108415 0.6697926 0.1654863 1.907439 5071
## [3] {onion, pepper} => {garlic} 0.1043716 0.6675521 0.1563497 1.909378 4775
## [4] {olive, pepper} => {garlic} 0.1094645 0.6539566 0.1673880 1.870492 5008
## [5] {tomato} => {garlic} 0.1057705 0.6330455 0.1670820 1.810680 4839
## [6] {garlic, olive} => {pepper} 0.1094645 0.6150823 0.1779672 1.751635 5008
## [7] {flour} => {butter} 0.1162623 0.6137780 0.1894208 2.218738 5319
## [8] {olive} => {garlic} 0.1779672 0.6041852 0.2945574 1.728132 8142
rules.by.lift<-sort(rules1b, by="lift", decreasing=TRUE)
inspect(rules.by.lift)
## lhs rhs support confidence coverage lift count
## [1] {flour} => {butter} 0.1162623 0.6137780 0.1894208 2.218738 5319
## [2] {onion, pepper} => {garlic} 0.1043716 0.6675521 0.1563497 1.909378 4775
## [3] {garlic, salt} => {pepper} 0.1108415 0.6697926 0.1654863 1.907439 5071
## [4] {olive, pepper} => {garlic} 0.1094645 0.6539566 0.1673880 1.870492 5008
## [5] {tomato} => {garlic} 0.1057705 0.6330455 0.1670820 1.810680 4839
## [6] {garlic, olive} => {pepper} 0.1094645 0.6150823 0.1779672 1.751635 5008
## [7] {olive} => {garlic} 0.1779672 0.6041852 0.2945574 1.728132 8142
## [8] {flour} => {salt} 0.1270164 0.6705516 0.1894208 1.636582 5811
plot(rules1b, method="paracoord", control=list(reorder=TRUE))
For support = 0.1 and confidence = 0.55, there are 19 rules detected. I can consider it as a satsiafactory result.
rules1a<-apriori(trans1, parameter=list(supp=0.10, conf=0.55))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.55 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 4575
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[925 item(s), 45750 transaction(s)] done [0.09s].
## sorting and recoding items ... [17 item(s)] done [0.01s].
## creating transaction tree ... done [0.03s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [19 rule(s)] done [0.00s].
## creating S4 object ... done [0.01s].
plot_rules1a <-plot(rules1a, measure=c("support","lift"), shading="confidence", main="Morning transactions rules")
plot(rules1a)
plot(rules1a, method="grouped")
plot(rules1a, method="graph", measure="support", shading="lift", main="Graf dla 8 reguł")
## Available control parameters (with default values):
## layout = stress
## circular = FALSE
## ggraphdots = NULL
## edges = <environment>
## nodes = <environment>
## nodetext = <environment>
## colors = c("#EE0000FF", "#EEEEEEFF")
## engine = ggplot2
## max = 100
## verbose = FALSE
Another very useful option is to sort the rules by the support, confidence and lift levels. The highest support value is equal to 19.78%. It can be understood as that pair: {pepper -> garlic} occurs in 19.78% of all recipes (9049 times). The highest confidence value achieved is 67% for {flour -> salt}. It tells us that if someone use flour, he will also use butter with 67% probability. The second highest confidence suggests that if someone use garlic or salt, he will also use pepper with probability 66%. Lastly, the highest lift value = 2.55 is recorder in rule {flour -> egg}. Interpreting third and seventh rules, garlic would be more likely used if we used an onion and pepper, rather than olive and pepper.
rules.by.supp<-sort(rules1a, by="support", decreasing=TRUE)
inspect(rules.by.supp)
## lhs rhs support confidence coverage lift count
## [1] {pepper} => {garlic} 0.1977923 0.5632742 0.3511475 1.611116 9049
## [2] {garlic} => {pepper} 0.1977923 0.5657393 0.3496175 1.611116 9049
## [3] {olive} => {garlic} 0.1779672 0.6041852 0.2945574 1.728132 8142
## [4] {onion} => {garlic} 0.1773552 0.5847085 0.3033224 1.672424 8114
## [5] {olive} => {pepper} 0.1673880 0.5682695 0.2945574 1.618321 7658
## [6] {egg} => {salt} 0.1307978 0.5688754 0.2299235 1.388426 5984
## [7] {flour} => {salt} 0.1270164 0.6705516 0.1894208 1.636582 5811
## [8] {flour} => {butter} 0.1162623 0.6137780 0.1894208 2.218738 5319
## [9] {flour} => {egg} 0.1113880 0.5880452 0.1894208 2.557569 5096
## [10] {garlic, pepper} => {salt} 0.1108415 0.5603934 0.1977923 1.367725 5071
## [11] {pepper, salt} => {garlic} 0.1108415 0.5852279 0.1893989 1.673909 5071
## [12] {garlic, salt} => {pepper} 0.1108415 0.6697926 0.1654863 1.907439 5071
## [13] {olive, pepper} => {garlic} 0.1094645 0.6539566 0.1673880 1.870492 5008
## [14] {garlic, olive} => {pepper} 0.1094645 0.6150823 0.1779672 1.751635 5008
## [15] {garlic, pepper} => {olive} 0.1094645 0.5534313 0.1977923 1.878857 5008
## [16] {flour} => {sugar} 0.1060109 0.5596584 0.1894208 1.906506 4850
## [17] {tomato} => {garlic} 0.1057705 0.6330455 0.1670820 1.810680 4839
## [18] {onion, pepper} => {garlic} 0.1043716 0.6675521 0.1563497 1.909378 4775
## [19] {garlic, onion} => {pepper} 0.1043716 0.5884890 0.1773552 1.675902 4775
rules.by.conf<-sort(rules1a, by="confidence", decreasing=TRUE)
inspect(rules.by.conf)
## lhs rhs support confidence coverage lift count
## [1] {flour} => {salt} 0.1270164 0.6705516 0.1894208 1.636582 5811
## [2] {garlic, salt} => {pepper} 0.1108415 0.6697926 0.1654863 1.907439 5071
## [3] {onion, pepper} => {garlic} 0.1043716 0.6675521 0.1563497 1.909378 4775
## [4] {olive, pepper} => {garlic} 0.1094645 0.6539566 0.1673880 1.870492 5008
## [5] {tomato} => {garlic} 0.1057705 0.6330455 0.1670820 1.810680 4839
## [6] {garlic, olive} => {pepper} 0.1094645 0.6150823 0.1779672 1.751635 5008
## [7] {flour} => {butter} 0.1162623 0.6137780 0.1894208 2.218738 5319
## [8] {olive} => {garlic} 0.1779672 0.6041852 0.2945574 1.728132 8142
## [9] {garlic, onion} => {pepper} 0.1043716 0.5884890 0.1773552 1.675902 4775
## [10] {flour} => {egg} 0.1113880 0.5880452 0.1894208 2.557569 5096
## [11] {pepper, salt} => {garlic} 0.1108415 0.5852279 0.1893989 1.673909 5071
## [12] {onion} => {garlic} 0.1773552 0.5847085 0.3033224 1.672424 8114
## [13] {egg} => {salt} 0.1307978 0.5688754 0.2299235 1.388426 5984
## [14] {olive} => {pepper} 0.1673880 0.5682695 0.2945574 1.618321 7658
## [15] {garlic} => {pepper} 0.1977923 0.5657393 0.3496175 1.611116 9049
## [16] {pepper} => {garlic} 0.1977923 0.5632742 0.3511475 1.611116 9049
## [17] {garlic, pepper} => {salt} 0.1108415 0.5603934 0.1977923 1.367725 5071
## [18] {flour} => {sugar} 0.1060109 0.5596584 0.1894208 1.906506 4850
## [19] {garlic, pepper} => {olive} 0.1094645 0.5534313 0.1977923 1.878857 5008
rules.by.lift<-sort(rules1a, by="lift", decreasing=TRUE)
inspect(rules.by.lift)
## lhs rhs support confidence coverage lift count
## [1] {flour} => {egg} 0.1113880 0.5880452 0.1894208 2.557569 5096
## [2] {flour} => {butter} 0.1162623 0.6137780 0.1894208 2.218738 5319
## [3] {onion, pepper} => {garlic} 0.1043716 0.6675521 0.1563497 1.909378 4775
## [4] {garlic, salt} => {pepper} 0.1108415 0.6697926 0.1654863 1.907439 5071
## [5] {flour} => {sugar} 0.1060109 0.5596584 0.1894208 1.906506 4850
## [6] {garlic, pepper} => {olive} 0.1094645 0.5534313 0.1977923 1.878857 5008
## [7] {olive, pepper} => {garlic} 0.1094645 0.6539566 0.1673880 1.870492 5008
## [8] {tomato} => {garlic} 0.1057705 0.6330455 0.1670820 1.810680 4839
## [9] {garlic, olive} => {pepper} 0.1094645 0.6150823 0.1779672 1.751635 5008
## [10] {olive} => {garlic} 0.1779672 0.6041852 0.2945574 1.728132 8142
## [11] {garlic, onion} => {pepper} 0.1043716 0.5884890 0.1773552 1.675902 4775
## [12] {pepper, salt} => {garlic} 0.1108415 0.5852279 0.1893989 1.673909 5071
## [13] {onion} => {garlic} 0.1773552 0.5847085 0.3033224 1.672424 8114
## [14] {flour} => {salt} 0.1270164 0.6705516 0.1894208 1.636582 5811
## [15] {olive} => {pepper} 0.1673880 0.5682695 0.2945574 1.618321 7658
## [16] {garlic} => {pepper} 0.1977923 0.5657393 0.3496175 1.611116 9049
## [17] {pepper} => {garlic} 0.1977923 0.5632742 0.3511475 1.611116 9049
## [18] {egg} => {salt} 0.1307978 0.5688754 0.2299235 1.388426 5984
## [19] {garlic, pepper} => {salt} 0.1108415 0.5603934 0.1977923 1.367725 5071
plot(rules1a, method="paracoord", control=list(reorder=TRUE))
In order to find the most interesting assotiation rules, I decided to analyze more association rules for lower confidence level = 0.50.
rules1c<-apriori(trans1, parameter=list(supp=0.10, conf=0.50))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 4575
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[925 item(s), 45750 transaction(s)] done [0.07s].
## sorting and recoding items ... [17 item(s)] done [0.00s].
## creating transaction tree ... done [0.03s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [28 rule(s)] done [0.00s].
## creating S4 object ... done [0.01s].
plot(rules1c, measure=c("support","lift"), shading="confidence", main="Morning transactions rules")
plot(rules1c)
plot(rules1c, method="grouped")
plot(rules1c, method="graph", measure="support", shading="lift", main="Graf dla 8 reguł")
## Available control parameters (with default values):
## layout = stress
## circular = FALSE
## ggraphdots = NULL
## edges = <environment>
## nodes = <environment>
## nodetext = <environment>
## colors = c("#EE0000FF", "#EEEEEEFF")
## engine = ggplot2
## max = 100
## verbose = FALSE
Despite the increased number of rules to 28, the highest support, confidence and lift values are the same as for the previous version with confidence = 0.55.
rules.by.supp<-sort(rules1c, by="support", decreasing=TRUE)
inspect(rules.by.supp)
## lhs rhs support confidence coverage lift count
## [1] {pepper} => {garlic} 0.1977923 0.5632742 0.3511475 1.611116 9049
## [2] {garlic} => {pepper} 0.1977923 0.5657393 0.3496175 1.611116 9049
## [3] {pepper} => {salt} 0.1893989 0.5393713 0.3511475 1.316417 8665
## [4] {olive} => {garlic} 0.1779672 0.6041852 0.2945574 1.728132 8142
## [5] {garlic} => {olive} 0.1779672 0.5090341 0.3496175 1.728132 8142
## [6] {onion} => {garlic} 0.1773552 0.5847085 0.3033224 1.672424 8114
## [7] {garlic} => {onion} 0.1773552 0.5072835 0.3496175 1.672424 8114
## [8] {olive} => {pepper} 0.1673880 0.5682695 0.2945574 1.618321 7658
## [9] {onion} => {pepper} 0.1563497 0.5154572 0.3033224 1.467922 7153
## [10] {sugar} => {salt} 0.1495738 0.5095309 0.2935519 1.243587 6843
## [11] {butter} => {salt} 0.1391694 0.5030815 0.2766339 1.227846 6367
## [12] {egg} => {salt} 0.1307978 0.5688754 0.2299235 1.388426 5984
## [13] {flour} => {salt} 0.1270164 0.6705516 0.1894208 1.636582 5811
## [14] {egg} => {sugar} 0.1261421 0.5486263 0.2299235 1.868924 5771
## [15] {flour} => {butter} 0.1162623 0.6137780 0.1894208 2.218738 5319
## [16] {water} => {salt} 0.1139454 0.5396480 0.2111475 1.317092 5213
## [17] {flour} => {egg} 0.1113880 0.5880452 0.1894208 2.557569 5096
## [18] {garlic, pepper} => {salt} 0.1108415 0.5603934 0.1977923 1.367725 5071
## [19] {pepper, salt} => {garlic} 0.1108415 0.5852279 0.1893989 1.673909 5071
## [20] {garlic, salt} => {pepper} 0.1108415 0.6697926 0.1654863 1.907439 5071
## [21] {olive, pepper} => {garlic} 0.1094645 0.6539566 0.1673880 1.870492 5008
## [22] {garlic, olive} => {pepper} 0.1094645 0.6150823 0.1779672 1.751635 5008
## [23] {garlic, pepper} => {olive} 0.1094645 0.5534313 0.1977923 1.878857 5008
## [24] {flour} => {sugar} 0.1060109 0.5596584 0.1894208 1.906506 4850
## [25] {tomato} => {garlic} 0.1057705 0.6330455 0.1670820 1.810680 4839
## [26] {onion, pepper} => {garlic} 0.1043716 0.6675521 0.1563497 1.909378 4775
## [27] {garlic, onion} => {pepper} 0.1043716 0.5884890 0.1773552 1.675902 4775
## [28] {garlic, pepper} => {onion} 0.1043716 0.5276826 0.1977923 1.739676 4775
rules.by.conf<-sort(rules1c, by="confidence", decreasing=TRUE)
inspect(rules.by.conf)
## lhs rhs support confidence coverage lift count
## [1] {flour} => {salt} 0.1270164 0.6705516 0.1894208 1.636582 5811
## [2] {garlic, salt} => {pepper} 0.1108415 0.6697926 0.1654863 1.907439 5071
## [3] {onion, pepper} => {garlic} 0.1043716 0.6675521 0.1563497 1.909378 4775
## [4] {olive, pepper} => {garlic} 0.1094645 0.6539566 0.1673880 1.870492 5008
## [5] {tomato} => {garlic} 0.1057705 0.6330455 0.1670820 1.810680 4839
## [6] {garlic, olive} => {pepper} 0.1094645 0.6150823 0.1779672 1.751635 5008
## [7] {flour} => {butter} 0.1162623 0.6137780 0.1894208 2.218738 5319
## [8] {olive} => {garlic} 0.1779672 0.6041852 0.2945574 1.728132 8142
## [9] {garlic, onion} => {pepper} 0.1043716 0.5884890 0.1773552 1.675902 4775
## [10] {flour} => {egg} 0.1113880 0.5880452 0.1894208 2.557569 5096
## [11] {pepper, salt} => {garlic} 0.1108415 0.5852279 0.1893989 1.673909 5071
## [12] {onion} => {garlic} 0.1773552 0.5847085 0.3033224 1.672424 8114
## [13] {egg} => {salt} 0.1307978 0.5688754 0.2299235 1.388426 5984
## [14] {olive} => {pepper} 0.1673880 0.5682695 0.2945574 1.618321 7658
## [15] {garlic} => {pepper} 0.1977923 0.5657393 0.3496175 1.611116 9049
## [16] {pepper} => {garlic} 0.1977923 0.5632742 0.3511475 1.611116 9049
## [17] {garlic, pepper} => {salt} 0.1108415 0.5603934 0.1977923 1.367725 5071
## [18] {flour} => {sugar} 0.1060109 0.5596584 0.1894208 1.906506 4850
## [19] {garlic, pepper} => {olive} 0.1094645 0.5534313 0.1977923 1.878857 5008
## [20] {egg} => {sugar} 0.1261421 0.5486263 0.2299235 1.868924 5771
## [21] {water} => {salt} 0.1139454 0.5396480 0.2111475 1.317092 5213
## [22] {pepper} => {salt} 0.1893989 0.5393713 0.3511475 1.316417 8665
## [23] {garlic, pepper} => {onion} 0.1043716 0.5276826 0.1977923 1.739676 4775
## [24] {onion} => {pepper} 0.1563497 0.5154572 0.3033224 1.467922 7153
## [25] {sugar} => {salt} 0.1495738 0.5095309 0.2935519 1.243587 6843
## [26] {garlic} => {olive} 0.1779672 0.5090341 0.3496175 1.728132 8142
## [27] {garlic} => {onion} 0.1773552 0.5072835 0.3496175 1.672424 8114
## [28] {butter} => {salt} 0.1391694 0.5030815 0.2766339 1.227846 6367
rules.by.lift<-sort(rules1c, by="lift", decreasing=TRUE)
inspect(rules.by.lift)
## lhs rhs support confidence coverage lift count
## [1] {flour} => {egg} 0.1113880 0.5880452 0.1894208 2.557569 5096
## [2] {flour} => {butter} 0.1162623 0.6137780 0.1894208 2.218738 5319
## [3] {onion, pepper} => {garlic} 0.1043716 0.6675521 0.1563497 1.909378 4775
## [4] {garlic, salt} => {pepper} 0.1108415 0.6697926 0.1654863 1.907439 5071
## [5] {flour} => {sugar} 0.1060109 0.5596584 0.1894208 1.906506 4850
## [6] {garlic, pepper} => {olive} 0.1094645 0.5534313 0.1977923 1.878857 5008
## [7] {olive, pepper} => {garlic} 0.1094645 0.6539566 0.1673880 1.870492 5008
## [8] {egg} => {sugar} 0.1261421 0.5486263 0.2299235 1.868924 5771
## [9] {tomato} => {garlic} 0.1057705 0.6330455 0.1670820 1.810680 4839
## [10] {garlic, olive} => {pepper} 0.1094645 0.6150823 0.1779672 1.751635 5008
## [11] {garlic, pepper} => {onion} 0.1043716 0.5276826 0.1977923 1.739676 4775
## [12] {garlic} => {olive} 0.1779672 0.5090341 0.3496175 1.728132 8142
## [13] {olive} => {garlic} 0.1779672 0.6041852 0.2945574 1.728132 8142
## [14] {garlic, onion} => {pepper} 0.1043716 0.5884890 0.1773552 1.675902 4775
## [15] {pepper, salt} => {garlic} 0.1108415 0.5852279 0.1893989 1.673909 5071
## [16] {onion} => {garlic} 0.1773552 0.5847085 0.3033224 1.672424 8114
## [17] {garlic} => {onion} 0.1773552 0.5072835 0.3496175 1.672424 8114
## [18] {flour} => {salt} 0.1270164 0.6705516 0.1894208 1.636582 5811
## [19] {olive} => {pepper} 0.1673880 0.5682695 0.2945574 1.618321 7658
## [20] {garlic} => {pepper} 0.1977923 0.5657393 0.3496175 1.611116 9049
## [21] {pepper} => {garlic} 0.1977923 0.5632742 0.3511475 1.611116 9049
## [22] {onion} => {pepper} 0.1563497 0.5154572 0.3033224 1.467922 7153
## [23] {egg} => {salt} 0.1307978 0.5688754 0.2299235 1.388426 5984
## [24] {garlic, pepper} => {salt} 0.1108415 0.5603934 0.1977923 1.367725 5071
## [25] {water} => {salt} 0.1139454 0.5396480 0.2111475 1.317092 5213
## [26] {pepper} => {salt} 0.1893989 0.5393713 0.3511475 1.316417 8665
## [27] {sugar} => {salt} 0.1495738 0.5095309 0.2935519 1.243587 6843
## [28] {butter} => {salt} 0.1391694 0.5030815 0.2766339 1.227846 6367
plot(rules1c, method="paracoord", control=list(reorder=TRUE))
Due to the fact that the database contains over 1000 ingredients and over 50 thousand recipes, some interesting combinations have not been detected. Therefore, with the support and confidence levels reduced, I found rules for selected products. For products like chocolate, which do not occur often (only 1% level of support), the confidence level is high: {chocolate} => {sugar}. The highest confidence level is recorded by the rules {chocolate, egg, flour, vanilla} => {sugar}, {cream, flour, vanilla} => {sugar} and {baking powder, butter, salt, vanilla} => {sugar}. They have the support level = 0.01, whereas confidence is over 99%.
rules_sugar<-apriori(data=trans1, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="rhs", lhs="sugar"), control=list(verbose=F))
rules_sugar_byconf<-sort(rules_sugar, by="confidence", decreasing=TRUE)
inspect((rules_sugar_byconf)[1:3], linebreak = FALSE)
## lhs rhs support confidence coverage lift count
## [1] {sugar} => {salt} 0.1495738 0.5095309 0.2935519 1.243587 6843
## [2] {sugar} => {egg} 0.1261421 0.4297096 0.2935519 1.868924 5771
## [3] {sugar} => {butter} 0.1197377 0.4078928 0.2935519 1.474486 5478
rules_sugar_1<-apriori(data=trans1, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="lhs", rhs="sugar"), control=list(verbose=F))
rules_sugar_byconf_1<-sort(rules_sugar_1, by="confidence", decreasing=TRUE)
inspect((rules_sugar_byconf_1)[1:3], linebreak = FALSE)
## lhs rhs support confidence
## [1] {chocolate, egg, flour, vanilla} => {sugar} 0.01022951 0.9957447
## [2] {cream, flour, vanilla} => {sugar} 0.01300546 0.9949833
## [3] {baking powder, butter, salt, vanilla} => {sugar} 0.01241530 0.9947461
## coverage lift count
## [1] 0.01027322 3.392057 468
## [2] 0.01307104 3.389463 595
## [3] 0.01248087 3.388655 568
rules_egg<-apriori(data=trans1, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="rhs", lhs="egg"), control=list(verbose=F))
rules_egg_byconf<-sort(rules_egg, by="confidence", decreasing=TRUE)
inspect((rules_egg_byconf)[1:3], linebreak = FALSE)
## lhs rhs support confidence coverage lift count
## [1] {egg} => {salt} 0.1307978 0.5688754 0.2299235 1.388426 5984
## [2] {egg} => {sugar} 0.1261421 0.5486263 0.2299235 1.868924 5771
## [3] {egg} => {flour} 0.1113880 0.4844567 0.2299235 2.557569 5096
rules_egg_1<-apriori(data=trans1, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="lhs", rhs="egg"), control=list(verbose=F))
rules_egg_byconf_1<-sort(rules_egg_1, by="confidence", decreasing=TRUE)
inspect((rules_egg_byconf_1)[1:3], linebreak = FALSE)
## lhs rhs support confidence
## [1] {baking soda, flour, salt, sugar, vanilla} => {egg} 0.01029508 0.9515152
## [2] {baking soda, flour, salt, vanilla} => {egg} 0.01036066 0.9480000
## [3] {baking soda, flour, sugar, vanilla} => {egg} 0.01208743 0.9469178
## coverage lift count
## [1] 0.01081967 4.138399 471
## [2] 0.01092896 4.123111 474
## [3] 0.01276503 4.118404 553
rules_milk<-apriori(data=trans1, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="rhs", lhs="milk"), control=list(verbose=F))
rules_milk_byconf<-sort(rules_milk, by="confidence", decreasing=TRUE)
inspect((rules_milk_byconf)[1:3], linebreak = FALSE)
## lhs rhs support confidence coverage lift count
## [1] {milk} => {egg} 0.05855738 0.5642376 0.1037814 2.454023 2679
## [2] {milk} => {butter} 0.05407650 0.5210615 0.1037814 1.883578 2474
## [3] {milk} => {salt} 0.05355191 0.5160067 0.1037814 1.259392 2450
rules_milk_1<-apriori(data=trans1, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="lhs", rhs="milk"), control=list(verbose=F))
rules_milk_byconf_1<-sort(rules_milk_1, by="confidence", decreasing=TRUE)
inspect((rules_milk_byconf_1)[1:3], linebreak = FALSE)
## lhs rhs support confidence
## [1] {baking powder, egg, flour, salt} => {milk} 0.01084153 0.3737754
## [2] {baking powder, butter, flour} => {milk} 0.01147541 0.3733997
## [3] {baking powder, butter, egg} => {milk} 0.01005464 0.3724696
## coverage lift count
## [1] 0.02900546 3.601564 496
## [2] 0.03073224 3.597944 525
## [3] 0.02699454 3.588982 460
rules_bread<-apriori(data=trans1, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="rhs", lhs="bread"), control=list(verbose=F))
rules_bread_byconf<-sort(rules_bread, by="confidence", decreasing=TRUE)
inspect((rules_bread_byconf)[1:3], linebreak = FALSE)
## lhs rhs support confidence coverage lift count
## [1] {bread} => {pepper} 0.02824044 0.4877312 0.05790164 1.388964 1292
## [2] {bread} => {garlic} 0.02681967 0.4631937 0.05790164 1.324858 1227
## [3] {bread} => {olive} 0.02581421 0.4458286 0.05790164 1.513554 1181
rules_bread_1<-apriori(data=trans1, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="lhs", rhs="bread"), control=list(verbose=F))
rules_bread_byconf_1<-sort(rules_bread_1, by="confidence", decreasing=TRUE)
inspect((rules_bread_byconf_1)[1:3], linebreak = FALSE)
## lhs rhs support confidence coverage lift
## [1] {egg, pepper} => {bread} 0.01156284 0.1916667 0.06032787 3.310211
## [2] {garlic, parsley} => {bread} 0.01101639 0.1504029 0.07324590 2.597558
## [3] {parsley, pepper} => {bread} 0.01044809 0.1432424 0.07293989 2.473893
## count
## [1] 529
## [2] 504
## [3] 478
rules_cinnamon<-apriori(data=trans1, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="rhs", lhs="cinnamon"), control=list(verbose=F))
rules_cinnamon_byconf<-sort(rules_cinnamon, by="confidence", decreasing=TRUE)
inspect((rules_cinnamon_byconf)[1:3], linebreak = FALSE)
## lhs rhs support confidence coverage lift count
## [1] {cinnamon} => {sugar} 0.05016393 0.6521739 0.07691803 2.221665 2295
## [2] {cinnamon} => {salt} 0.03849180 0.5004263 0.07691803 1.221366 1761
## [3] {cinnamon} => {butter} 0.03289617 0.4276783 0.07691803 1.546008 1505
rules_cinnamon_1<-apriori(data=trans1, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="lhs", rhs="cinnamon"), control=list(verbose=F))
rules_cinnamon_byconf_1<-sort(rules_cinnamon_1, by="confidence", decreasing=TRUE)
inspect((rules_cinnamon_byconf_1)[1:3], linebreak = FALSE)
## lhs rhs support confidence coverage lift
## [1] {clove, sugar} => {cinnamon} 0.01110383 0.7851623 0.01414208 10.207779
## [2] {clove, salt} => {cinnamon} 0.01016393 0.6981982 0.01455738 9.077172
## [3] {nutmeg, sugar} => {cinnamon} 0.01193443 0.6876574 0.01735519 8.940133
## count
## [1] 508
## [2] 465
## [3] 546
rules_chocolate<-apriori(data=trans1, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="rhs", lhs="chocolate"), control=list(verbose=F))
rules_chocolate_byconf<-sort(rules_chocolate, by="confidence", decreasing=TRUE)
inspect((rules_chocolate_byconf)[1:3], linebreak = FALSE)
## lhs rhs support confidence coverage lift count
## [1] {chocolate} => {sugar} 0.03143169 0.7564440 0.04155191 2.576866 1438
## [2] {chocolate} => {egg} 0.02330055 0.5607575 0.04155191 2.438887 1066
## [3] {chocolate} => {butter} 0.02229508 0.5365597 0.04155191 1.939602 1020
rules_chocolate_1<-apriori(data=trans1, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="lhs", rhs="chocolate"), control=list(verbose=F))
rules_chocolate_byconf_1<-sort(rules_chocolate_1, by="confidence", decreasing=TRUE)
inspect((rules_chocolate_byconf_1)[1:3], linebreak = FALSE)
## lhs rhs support confidence
## [1] {cream, egg, sugar} => {chocolate} 0.01014208 0.2783443
## [2] {butter, egg, sugar, vanilla} => {chocolate} 0.01090710 0.2668449
## [3] {butter, egg, vanilla} => {chocolate} 0.01103825 0.2656497
## coverage lift count
## [1] 0.03643716 6.698713 464
## [2] 0.04087432 6.421965 499
## [3] 0.04155191 6.393199 505
rules_wine<-apriori(data=trans1, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="rhs", lhs="wine red"), control=list(verbose=F))
rules_wine_byconf<-sort(rules_wine, by="confidence", decreasing=TRUE)
inspect((rules_wine_byconf)[1:3], linebreak = FALSE)
## lhs rhs support confidence coverage lift count
## [1] {wine red} => {garlic} 0.01108197 0.6007109 0.01844809 1.718195 507
## [2] {wine red} => {olive} 0.01060109 0.5746445 0.01844809 1.950875 485
## [3] {wine red} => {onion} 0.01014208 0.5497630 0.01844809 1.812471 464
rules_wine_1<-apriori(data=trans1, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="lhs", rhs="wine red"), control=list(verbose=F))
rules_wine_byconf_1<-sort(rules_wine_1, by="confidence", decreasing=TRUE)
inspect((rules_wine_byconf_1)[1:3], linebreak = FALSE)
## lhs rhs support confidence coverage lift count
## [1] {olive} => {wine red} 0.01060109 0.03598991 0.2945574 1.950875 485
## [2] {onion} => {wine red} 0.01014208 0.03343662 0.3033224 1.812471 464
## [3] {garlic} => {wine red} 0.01108197 0.03169741 0.3496175 1.718195 507
rules_chicken<-apriori(data=trans1, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="rhs", lhs="chicken"), control=list(verbose=F))
rules_chicken_byconf<-sort(rules_chicken, by="confidence", decreasing=TRUE)
inspect((rules_chicken_byconf)[1:3], linebreak = FALSE)
## lhs rhs support confidence coverage lift count
## [1] {chicken} => {garlic} 0.08756284 0.5683076 0.1540765 1.625512 4006
## [2] {chicken} => {onion} 0.08212022 0.5329834 0.1540765 1.757151 3757
## [3] {chicken} => {pepper} 0.07746448 0.5027663 0.1540765 1.431781 3544
rules_chicken_1<-apriori(data=trans1, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="lhs", rhs="chicken"), control=list(verbose=F))
rules_chicken_byconf_1<-sort(rules_chicken_1, by="confidence", decreasing=TRUE)
inspect((rules_chicken_byconf_1)[1:3], linebreak = FALSE)
## lhs rhs support confidence coverage lift
## [1] {onion, rice} => {chicken} 0.01130055 0.4751838 0.02378142 3.084077
## [2] {garlic, rice} => {chicken} 0.01018579 0.4735772 0.02150820 3.073650
## [3] {carrot, celery} => {chicken} 0.01084153 0.4126456 0.02627322 2.678186
## count
## [1] 517
## [2] 466
## [3] 496
rules_tomato<-apriori(data=trans1, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="rhs", lhs="tomato"), control=list(verbose=F))
rules_tomato_byconf<-sort(rules_tomato, by="confidence", decreasing=TRUE)
inspect((rules_tomato_byconf)[1:3], linebreak = FALSE)
## lhs rhs support confidence coverage lift count
## [1] {tomato} => {garlic} 0.10577049 0.6330455 0.167082 1.810680 4839
## [2] {tomato} => {onion} 0.09702732 0.5807169 0.167082 1.914520 4439
## [3] {tomato} => {olive} 0.09106011 0.5450026 0.167082 1.850243 4166
rules_tomato_1<-apriori(data=trans1, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="lhs", rhs="tomato"), control=list(verbose=F))
rules_tomato_byconf_1<-sort(rules_tomato_1, by="confidence", decreasing=TRUE)
inspect((rules_tomato_byconf_1)[1:3], linebreak = FALSE)
## lhs rhs support confidence coverage
## [1] {basil, garlic, olive, onion} => {tomato} 0.01027322 0.7378336 0.01392350
## [2] {basil, olive, onion} => {tomato} 0.01165027 0.7202703 0.01617486
## [3] {basil, garlic, onion} => {tomato} 0.01366120 0.6565126 0.02080874
## lift count
## [1] 4.415998 470
## [2] 4.310880 533
## [3] 3.929285 625
The goal of Association Rule Mining, given a set of transactions, is to find the rules that allow us to predict the occurrence of a specific item based on the occurrences of the other items in the transaction. Association Rules in analyzed recipes dataset provided many insights. Apriori algorithm was performed for support level = 0.10 and confidence level: 0.50, 0.55, 0.60. Apriori algorithm found interesting patterns, which actually encounter in real life. As I expected before, the most common products and rules consist of spices and basic foodstuff like butter, olive or flour. Additionally, thanks to couple associations rules I spot some interesting occurrences: the usage of flour ties in with the usege of eggs, which makes sense, as in the databse there are a lot of baking recipes (f.e. cakes).