Food recipe - Association rules

Introduction

Association rules are a fundamental concept in data mining and machine learning, serving as a powerful analytical tool to discover meaningful relationships and patterns within datasets. The primary objective of association rule mining is to reveal associations or correlations between items in a dataset based on their co-occurrence in transactions. This technique is particularly valuable in scenarios where understanding the inherent relationships between different items or variables can provide actionable insights.

The central notion revolves around identifying frequent itemsets, which are subsets of items that frequently appear together in the dataset. These itemsets are used to generate association rules that express relationships between items in the form of “if-then” statements. The rules consist of an antecedent (the “if” part) and a consequent (the “then” part). For example, if item X is present in a transaction, there is a high likelihood that item Y will also be present. Association rules are quantified by metrics such as support, confidence, and lift, which help evaluate the strength and significance of the identified associations.

Aim of the Study

This study endeavors to apply association rule mining techniques, specifically the Apriori and Eclat algorithms, to unravel patterns within a comprehensive food recipe dataset. The primary focus lies in discerning associations among ingredients within recipes, shedding light on commonly co-occurring components in culinary creations. The dataset under consideration comprises a vast repository of over 0.5 million recipes. However, for the sake of computational efficiency, the analysis concentrates on the initial subset of 20,000 recipes.

The overarching goal of this study is to extract actionable insights for culinary and retail industries. By identifying associations among ingredients, the study aims to provide valuable information for various applications, including recipe recommendation systems, inventory management, and strategic product placement. Additionally, the visualization of discovered association rules using the arulesViz package enhances the interpretability of the findings, facilitating a more intuitive understanding of the complex relationships within the culinary landscape. Through these analytical endeavors, the study aims to contribute to the enhancement of decision-making processes in culinary and retail domains.

# loading packages 

library(tidyverse)
library(plyr)
library(arules)
library(arulesViz)
library(arulesCBA)
library(ggplot2)

About dataset

The Food.com recipes dataset contains 522,517 recipes from 312 different categories. This dataset provides information about each recipe like cooking times, servings, ingredients, nutrition, instructions, and more.

The reviews dataset contains 1,401,982 reviews from 271,907 different users. This dataset provides information about the author, rating, review text, and more.

# loading dataset 

df <- read.csv("recipes.csv", header = TRUE, sep=',')

# let's see first 2 rows of dataset

head(df, 2)

##   RecipeId                              Name AuthorId AuthorName CookTime
## 1       38 Low-Fat Berry Blue Frozen Dessert     1533     Dancer    PT24H
## 2       39                           Biryani     1567   elly9812    PT25M
##   PrepTime TotalTime        DatePublished
## 1    PT45M  PT24H45M 1999-08-09T21:46:00Z
## 2     PT4H   PT4H25M 1999-08-29T13:12:00Z
##                                                                   Description
## 1 Make and share this Low-Fat Berry Blue Frozen Dessert recipe from Food.com.
## 2                           Make and share this Biryani recipe from Food.com.
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Images
## 1 c("https://img.sndimg.com/food/image/upload/w_555,h_416,c_fit,fl_progressive,q_95/v1/img/recipes/38/YUeirxMLQaeE1h3v3qnM_229%20berry%20blue%20frzn%20dess.jpg", "https://img.sndimg.com/food/image/upload/w_555,h_416,c_fit,fl_progressive,q_95/v1/img/recipes/38/AFPDDHATWzQ0b1CDpDAT_255%20berry%20blue%20frzn%20dess.jpg", "https://img.sndimg.com/food/image/upload/w_555,h_416,c_fit,fl_progressive,q_95/v1/img/recipes/38/UYgf9nwMT2SGGJCuzILO_228%20berry%20blue%20frzn%20dess.jpg", "https://img.sndimg.com/food/image/upload/w_555,h_416,c_fit,fl_progressive,q_95/v1/img/recipes/38/PeBMJN2TGSaYks2759BA_20140722_202142.jpg", \n"https://img.sndimg.com/food/image/upload/w_555,h_416,c_fit,fl_progressive,q_95/v1/img/recipes/38/picuaETeN.jpg", "https://img.sndimg.com/food/image/upload/w_555,h_416,c_fit,fl_progressive,q_95/v1/img/recipes/38/pictzvxW5.jpg")
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          c("https://img.sndimg.com/food/image/upload/w_555,h_416,c_fit,fl_progressive,q_95/v1/img/recipes/39/picM9Mhnw.jpg", "https://img.sndimg.com/food/image/upload/w_555,h_416,c_fit,fl_progressive,q_95/v1/img/recipes/39/picHv4Ocr.jpg")
##    RecipeCategory
## 1 Frozen Desserts
## 2  Chicken Breast
##                                                                                                            Keywords
## 1 c("Dessert", "Low Protein", "Low Cholesterol", "Healthy", "Free Of...", "Summer", "Weeknight", "Freezer", "Easy")
## 2               c("Chicken Thigh & Leg", "Chicken", "Poultry", "Meat", "Asian", "Indian", "Weeknight", "Stove Top")
##                                                                                                                             RecipeIngredientQuantities
## 1                                                                                                                              c("4", "1/4", "1", "1")
## 2 c("1", "4", "2", "2", "8", "1/4", "8", "1/2", "1", "1", "1/4", "1/4", "1/2", "1/4", "2", "3", NA, "2", "1", "1", "8", "2", "1/3", "1/3", "1/3", "6")
##                                                                                                                                                                                                                                                                                                                      RecipeIngredientParts
## 1                                                                                                                                                                                                                                                                    c("blueberries", "granulated sugar", "vanilla yogurt", "lemon juice")
## 2 c("saffron", "milk", "hot green chili peppers", "onions", "garlic", "clove", "peppercorns", "cardamom seed", "cumin seed", "poppy seed", "mace", "cilantro", "mint leaf", "fresh lemon juice", "plain yogurt", "boneless chicken", "salt", "ghee", "onion", "tomatoes", "basmati rice", "long-grain rice", "raisins", "cashews", "eggs")
##   AggregatedRating ReviewCount Calories FatContent SaturatedFatContent
## 1              4.5           4    170.9        2.5                 1.3
## 2              3.0           1   1110.7       58.8                16.6
##   CholesterolContent SodiumContent CarbohydrateContent FiberContent
## 1                8.0          29.8                37.1          3.6
## 2              372.8         368.4                84.4          9.0
##   SugarContent ProteinContent RecipeServings RecipeYield
## 1         30.2            3.2              4        <NA>
## 2         20.4           63.4              6        <NA>
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          RecipeInstructions
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         c("Toss 2 cups berries with sugar.", "Let stand for 45 minutes, stirring occasionally.", "Transfer berry-sugar mixture to food processor.", "Add yogurt and process until smooth.", "Strain through fine sieve. Pour into baking pan (or transfer to ice cream maker and process according to manufacturers' directions). Freeze uncovered until edges are solid but centre is soft.  Transfer to processor and blend until smooth again.", "Return to pan and freeze until edges are solid.", "Transfer to processor and blend until smooth again.", \n"Fold in remaining 2 cups of blueberries.", "Pour into plastic mold and freeze overnight. Let soften slightly to serve.")
## 2 c("Soak saffron in warm milk for 5 minutes and puree in blender.", "Add chiles, onions, ginger, garlic, cloves, peppercorns, cardamom seeds, cinnamon, coriander and cumin seeds, poppy seeds, nutmeg, mace, cilantro or mint leaves and lemon juice. Blend into smooth paste. Put paste into large bowl, add yogurt and mix well.", "Marinate chicken in yogurt mixture with salt, covered for at least 2 - 6 hours in refrigerator.", "In skillet. heat oil over medium heat for 1 minute. Add ghee and 15 seconds later add onion and fry for about8 minutes.", \n"Reserve for garnish.", "In same skillet, cook chicken with its marinade with tomatoes for about 10 minutes over medium heat, uncovered.", "Remove chicken pieces from the sauce and set aside. Add rice to sauce, bring to boil, and cook, covered over low heat for 15 minutes.", "Return chicken and add raisins, cashews and almonds; mix well.", "Simmer, covered for 5 minutes.", "Place chicken, eggs and rice in large serving dish in such a way that yellow of the eggs, the saffron-colored rice, the nuts and the chicken make a colorful display.", \n"Add reserved onion as garnish.")

# check the data structure 
str(df)

## 'data.frame':    522517 obs. of  28 variables:
##  $ RecipeId                  : int  38 39 40 41 42 43 44 45 46 47 ...
##  $ Name                      : chr  "Low-Fat Berry Blue Frozen Dessert" "Biryani" "Best Lemonade" "Carina's Tofu-Vegetable Kebabs" ...
##  $ AuthorId                  : int  1533 1567 1566 1586 1538 34879 1596 1580 1533 1573 ...
##  $ AuthorName                : chr  "Dancer" "elly9812" "Stephen Little" "Cyclopz" ...
##  $ CookTime                  : chr  "PT24H" "PT25M" "PT5M" "PT20M" ...
##  $ PrepTime                  : chr  "PT45M" "PT4H" "PT30M" "PT24H" ...
##  $ TotalTime                 : chr  "PT24H45M" "PT4H25M" "PT35M" "PT24H20M" ...
##  $ DatePublished             : chr  "1999-08-09T21:46:00Z" "1999-08-29T13:12:00Z" "1999-09-05T19:52:00Z" "1999-09-03T14:54:00Z" ...
##  $ Description               : chr  "Make and share this Low-Fat Berry Blue Frozen Dessert recipe from Food.com." "Make and share this Biryani recipe from Food.com." "This is from one of my  first Good House Keeping cookbooks.  You must use a *zester* in order to avoid getting "| __truncated__ "This dish is best prepared a day in advance to allow the ingredients to soak in  the marinade overnight." ...
##  $ Images                    : chr  "c(\"https://img.sndimg.com/food/image/upload/w_555,h_416,c_fit,fl_progressive,q_95/v1/img/recipes/38/YUeirxMLQa"| __truncated__ "c(\"https://img.sndimg.com/food/image/upload/w_555,h_416,c_fit,fl_progressive,q_95/v1/img/recipes/39/picM9Mhnw."| __truncated__ "c(\"https://img.sndimg.com/food/image/upload/w_555,h_416,c_fit,fl_progressive,q_95/v1/img/recipes/40/picJ4Sz3N."| __truncated__ "c(\"https://img.sndimg.com/food/image/upload/w_555,h_416,c_fit,fl_progressive,q_95/v1/img/recipes/41/picmbLig8."| __truncated__ ...
##  $ RecipeCategory            : chr  "Frozen Desserts" "Chicken Breast" "Beverages" "Soy/Tofu" ...
##  $ Keywords                  : chr  "c(\"Dessert\", \"Low Protein\", \"Low Cholesterol\", \"Healthy\", \"Free Of...\", \"Summer\", \"Weeknight\", \""| __truncated__ "c(\"Chicken Thigh & Leg\", \"Chicken\", \"Poultry\", \"Meat\", \"Asian\", \"Indian\", \"Weeknight\", \"Stove Top\")" "c(\"Low Protein\", \"Low Cholesterol\", \"Healthy\", \"Summer\", \"< 60 Mins\")" "c(\"Beans\", \"Vegetable\", \"Low Cholesterol\", \"Weeknight\", \"Broil/Grill\", \"Oven\")" ...
##  $ RecipeIngredientQuantities: chr  "c(\"4\", \"1/4\", \"1\", \"1\")" "c(\"1\", \"4\", \"2\", \"2\", \"8\", \"1/4\", \"8\", \"1/2\", \"1\", \"1\", \"1/4\", \"1/4\", \"1/2\", \"1/4\","| __truncated__ "c(\"1 1/2\", \"1\", NA, \"1 1/2\", NA, \"3/4\")" "c(\"12\", \"1\", \"2\", \"1\", \"10\", \"1\", \"3\", \"2\", \"2\", \"2\", \"1\", \"2\", \"1/2\", \"1/4\", \"4\")" ...
##  $ RecipeIngredientParts     : chr  "c(\"blueberries\", \"granulated sugar\", \"vanilla yogurt\", \"lemon juice\")" "c(\"saffron\", \"milk\", \"hot green chili peppers\", \"onions\", \"garlic\", \"clove\", \"peppercorns\", \"car"| __truncated__ "c(\"sugar\", \"lemons, rind of\", \"lemon, zest of\", \"fresh water\", \"fresh lemon juice\")" "c(\"extra firm tofu\", \"eggplant\", \"zucchini\", \"mushrooms\", \"soy sauce\", \"low sodium soy sauce\", \"ol"| __truncated__ ...
##  $ AggregatedRating          : num  4.5 3 4.5 4.5 4.5 1 5 4 5 4 ...
##  $ ReviewCount               : int  4 1 10 2 11 1 23 3 2 2 ...
##  $ Calories                  : num  171 1111 311 536 104 ...
##  $ FatContent                : num  2.5 58.8 0.2 24 0.4 19.3 66.8 7.1 0 5.6 ...
##  $ SaturatedFatContent       : num  1.3 16.6 0 3.8 0.1 10.9 31.9 1.7 0 1.4 ...
##  $ CholesterolContent        : num  8 373 0 0 0 ...
##  $ SodiumContent             : num  29.8 368.4 1.8 1558.6 959.3 ...
##  $ CarbohydrateContent       : num  37.1 84.4 81.5 64.2 25.1 58 29.1 37.5 1.1 4.5 ...
##  $ FiberContent              : num  3.6 9 0.4 17.3 4.8 1.8 3.1 0.5 0.2 0.6 ...
##  $ SugarContent              : num  30.2 20.4 77.2 32.1 17.7 42.5 5 24.7 0.2 1.6 ...
##  $ ProteinContent            : num  3.2 63.4 0.3 29.3 4.3 7 45.3 4.2 0.1 0.8 ...
##  $ RecipeServings            : int  4 6 4 2 4 8 2 8 NA NA ...
##  $ RecipeYield               : chr  NA NA NA "4 kebabs" ...
##  $ RecipeInstructions        : chr  "c(\"Toss 2 cups berries with sugar.\", \"Let stand for 45 minutes, stirring occasionally.\", \"Transfer berry-s"| __truncated__ "c(\"Soak saffron in warm milk for 5 minutes and puree in blender.\", \"Add chiles, onions, ginger, garlic, clov"| __truncated__ "c(\"Into a 1 quart Jar with tight fitting lid, put sugar and lemon peel, or zest;  add 1 1/2 cups very hot wate"| __truncated__ "c(\"Drain the tofu, carefully squeezing out excess water,  and pat dry with paper towels.\", \"Cut tofu into on"| __truncated__ ...

Our focus in this analysis is directed towards the “RecipeIngredientPart” column, chosen for its encapsulation of multiple ingredients within a single entry. This column functions analogously to a basket transaction, where each entry represents a collection of ingredients used in a particular food recipe. By treating each entry as a transaction, we aim to employ association rule mining techniques to unveil patterns and relationships among these ingredients.

# select first 20000 rows
df1 <- data.frame(recipe = df$RecipeIngredientParts[1:20000])

head(df1)

##                                                                                                                                                                                                                                                                                                                                     recipe
## 1                                                                                                                                                                                                                                                                    c("blueberries", "granulated sugar", "vanilla yogurt", "lemon juice")
## 2 c("saffron", "milk", "hot green chili peppers", "onions", "garlic", "clove", "peppercorns", "cardamom seed", "cumin seed", "poppy seed", "mace", "cilantro", "mint leaf", "fresh lemon juice", "plain yogurt", "boneless chicken", "salt", "ghee", "onion", "tomatoes", "basmati rice", "long-grain rice", "raisins", "cashews", "eggs")
## 3                                                                                                                                                                                                                                                      c("sugar", "lemons, rind of", "lemon, zest of", "fresh water", "fresh lemon juice")
## 4                                                                                                                c("extra firm tofu", "eggplant", "zucchini", "mushrooms", "soy sauce", "low sodium soy sauce", "olive oil", "maple syrup", "honey", "red wine vinegar", "lemon juice", "garlic cloves", "mustard powder", "black pepper")
## 5                                                                                                                                                                                                                                                                         c("plain tomato juice", "cabbage", "onion", "carrots", "celery")
## 6                                                                                                                                                                           c("graham cracker crumbs", "sugar", "butter", "sugar", "cornstarch", "salt", "milk", "vanilla extract", "water", "gelatin", "rum", "cream of tartar", "sugar")

Preprocessing

As observed earlier, the dataset requires preprocessing before being transformed into transactional data. It is imperative to initiate a cleaning process that involves the elimination of the initial ‘c’ letter, as well as the removal of parentheses and quotes from the row of the dataset.

clean_ingredients <- function(entry) {
  # Extract ingredients from the 4th index
  ingredients <- substr(entry, 4, nchar(entry))
  # Remove parentheses and quotes
  cleaned_ingredients <- gsub('["()]', '', ingredients)
  return(cleaned_ingredients)
}

# Apply the function to each row
df1$cleaned_ingredients <- lapply(df1$recipe, clean_ingredients)

# creting new dataframe with cleaned column
new_df <- data.frame(CleanedIngredientsColumn = sapply(df1$cleaned_ingredients, paste, collapse = ", "))

head(new_df, 3)

##                                                                                                                                                                                                                                                              CleanedIngredientsColumn
## 1                                                                                                                                                                                                                          blueberries, granulated sugar, vanilla yogurt, lemon juice
## 2 saffron, milk, hot green chili peppers, onions, garlic, clove, peppercorns, cardamom seed, cumin seed, poppy seed, mace, cilantro, mint leaf, fresh lemon juice, plain yogurt, boneless chicken, salt, ghee, onion, tomatoes, basmati rice, long-grain rice, raisins, cashews, eggs
## 3                                                                                                                                                                                                              sugar, lemons, rind of, lemon, zest of, fresh water, fresh lemon juice

At this stage, our dataset is prepared for conversion into transactional format. Subsequently, we can leverage this transactional data to implement association rule mining algorithms.

# transfering data into transactional format
trans <- strsplit(new_df$CleanedIngredientsColumn, ", ")

# Convert the data to a transaction format
trans <- as(trans, "transactions")

## Warning in asMethod(object): removing duplicated items in transactions

# check first 5 transactions
inspect(trans[1:5])

##     items                     
## [1] {blueberries,             
##      granulated sugar,        
##      lemon juice,             
##      vanilla yogurt}          
## [2] {basmati rice,            
##      boneless chicken,        
##      cardamom seed,           
##      cashews,                 
##      cilantro,                
##      clove,                   
##      cumin seed,              
##      eggs,                    
##      fresh lemon juice,       
##      garlic,                  
##      ghee,                    
##      hot green chili peppers, 
##      long-grain rice,         
##      mace,                    
##      milk,                    
##      mint leaf,               
##      onion,                   
##      onions,                  
##      peppercorns,             
##      plain yogurt,            
##      poppy seed,              
##      raisins,                 
##      saffron,                 
##      salt,                    
##      tomatoes}                
## [3] {fresh lemon juice,       
##      fresh water,             
##      lemon,                   
##      lemons,                  
##      rind of,                 
##      sugar,                   
##      zest of}                 
## [4] {black pepper,            
##      eggplant,                
##      extra firm tofu,         
##      garlic cloves,           
##      honey,                   
##      lemon juice,             
##      low sodium soy sauce,    
##      maple syrup,             
##      mushrooms,               
##      mustard powder,          
##      olive oil,               
##      red wine vinegar,        
##      soy sauce,               
##      zucchini}                
## [5] {cabbage,                 
##      carrots,                 
##      celery,                  
##      onion,                   
##      plain tomato juice}

In a total of 20,000 recipes, which ingredients are frequently utilized for cooking various foods?

# Plotting the Most Frequent Ingredients in Foods

itemFrequencyPlot(trans, topN = 15, type = "relative", 
                  main = "Frequent Ingredients in Foods - Item Frequency", 
                  col = rainbow(20))

Salt stands out as the predominant ingredient, making appearances in over 20,000 recipes. Following closely, butter and sugar secure the second and third positions, respectively. In contrast, eggs and baking powder are employed to a lesser extent. Now, we assess the transactions using the summary function to gain insights into their characteristics.

summary(trans)

## transactions as itemMatrix in sparse format with
##  20000 rows (elements/itemsets/transactions) and
##  4254 columns (items) and a density of 0.001770346 
## 
## most frequent items:
##    salt  butter   sugar   onion   water (Other) 
##    7768    5351    4857    3770    3627  125248 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
##  353  696 1093 1703 2127 2401 2352 2143 1907 1560 1164  821  583  429  247  174 
##   17   18   19   20   21   22   23   24   25   26   27   28   30   39 
##   90   57   40   27    7    9    5    1    3    4    1    1    1    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   5.000   7.000   7.531  10.000  39.000 
## 
## includes extended item information - examples:
##              labels
## 1      low-fat milk
## 2          \nyogurt
## 3 1% fat buttermilk

The dataset comprises 20,000 transactions, featuring a total of 4,254 unique ingredients and exhibiting a density of 0.0018. Among the most frequent items, salt and butter take the lead with 7,768 and 5,351 occurrences, respectively, across all transactions. The maximum transaction frequency is recorded at 2,401 times, characterized by 7 elements. Additionally, the maximum size of elements in a transaction is 39, observed in a single transaction.

Apriori Algorithm

The Apriori algorithm is a classic association rule mining technique used to discover patterns in datasets. It operates based on the principle of association rules, aiming to find frequent itemsets in a transactional dataset. The algorithm employs a breadth-first search strategy to identify and prune itemsets with lower support, ultimately revealing meaningful associations among items.

Support Support is a measure used in association rule mining to quantify the frequency of occurrence of a particular itemset in the dataset. It is calculated as the ratio of transactions containing the itemset to the total number of transactions. High support values indicate that the itemset is frequently present in the dataset.

Confidence Confidence measures the reliability or strength of an association rule. It is calculated as the ratio of the support of the combined antecedent and consequent of a rule to the support of the antecedent alone. High confidence indicates a strong likelihood that the presence of the antecedent implies the presence of the consequent in a transaction.

# apply Apriori algorithm to discover association rules
rules <- apriori(trans, parameter = list(support = 0.006,
                confidence = 0.4, minlen=2, maxlen = 15, target = "rules"))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.4    0.1    1 none FALSE            TRUE       5   0.006      2
##  maxlen target  ext
##      15  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 120 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[4254 item(s), 20000 transaction(s)] done [0.08s].
## sorting and recoding items ... [196 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 5 6 done [0.02s].
## writing ... [1532 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

# show summary of rules
summary(rules)

## set of 1532 rules
## 
## rule length distribution (lhs + rhs):sizes
##   2   3   4   5   6 
## 165 595 550 204  18 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   3.000   4.000   3.553   4.000   6.000 
## 
## summary of quality measures:
##     support          confidence        coverage            lift       
##  Min.   :0.00600   Min.   :0.4000   Min.   :0.00635   Min.   : 1.030  
##  1st Qu.:0.00720   1st Qu.:0.4773   1st Qu.:0.01229   1st Qu.: 1.869  
##  Median :0.00895   Median :0.5567   Median :0.01655   Median : 2.539  
##  Mean   :0.01230   Mean   :0.5804   Mean   :0.02244   Mean   : 3.165  
##  3rd Qu.:0.01315   3rd Qu.:0.6794   3rd Qu.:0.02460   3rd Qu.: 3.765  
##  Max.   :0.11970   Max.   :0.9449   Max.   :0.26755   Max.   :30.916  
##      count     
##  Min.   : 120  
##  1st Qu.: 144  
##  Median : 179  
##  Mean   : 246  
##  3rd Qu.: 263  
##  Max.   :2394  
## 
## mining info:
##   data ntransactions support confidence
##  trans         20000   0.006        0.4
##                                                                                                                   call
##  apriori(data = trans, parameter = list(support = 0.006, confidence = 0.4, minlen = 2, maxlen = 15, target = "rules"))

The Apriori algorithm has generated a total of 1,532 rules. The median length of these rules is 4, indicating that, on average, each rule involves 4 ingredients. Notably, the maximum lift value is nearly 31, a considerable deviation from the third quartile (Q3) of the lift, which is at 3.7.

inspect(head(rules, n = 10, by = "lift"))

##      lhs                               rhs           support confidence
## [1]  {rind of}                      => {lemon}       0.00850 0.7296137 
## [2]  {buttermilk, sugar}            => {baking soda} 0.00780 0.8571429 
## [3]  {buttermilk, flour}            => {baking soda} 0.00640 0.8311688 
## [4]  {buttermilk, eggs}             => {baking soda} 0.00635 0.8089172 
## [5]  {clove}                        => {cinnamon}    0.00725 0.7795699 
## [6]  {buttermilk, salt}             => {baking soda} 0.00900 0.7627119 
## [7]  {cinnamon, eggs, flour, salt}  => {baking soda} 0.00675 0.7258065 
## [8]  {cinnamon, eggs, flour, sugar} => {baking soda} 0.00665 0.7150538 
## [9]  {nutmeg, salt, sugar}          => {cinnamon}    0.00600 0.7185629 
## [10] {cinnamon, eggs, flour}        => {baking soda} 0.00845 0.6897959 
##      coverage lift     count
## [1]  0.01165  30.91584 170  
## [2]  0.00910  13.59465 156  
## [3]  0.00770  13.18269 128  
## [4]  0.00785  12.82977 127  
## [5]  0.00930  12.25739 145  
## [6]  0.01180  12.09694 180  
## [7]  0.00930  11.51160 135  
## [8]  0.00930  11.34106 133  
## [9]  0.00835  11.29816 120  
## [10] 0.01225  10.94046 169

The {rind of} => {lemon} with 0.73 confidence and lift is 31. This indicates a strong association, suggesting that when the ingredient “rind of” is used, there is a high probability of also using “lemon.” Also, multiple rules showcase a strong association between combinations of “buttermilk,” “sugar,” “flour,” or “eggs” with “baking soda.” For example, the rule {buttermilk, sugar} => {baking soda} has a confidence of 0.8571, indicating a high likelihood of using “baking soda” when “buttermilk” and “sugar” are present.

Visualizations

Let’s make visualizations of rules using arulesViz package.

plot(rules, measure=c("support","lift"), shading="confidence",
     main="Ingredient transactions rules")

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

We can see that many of rules’ confidence between 0.7 and 0.9, and their lift are among 2-10 interval.

plot(rules, method = "graph", measure = "support", 
     shading = "lift", main = "Ingredients association Rules Graph")

## Warning: Unknown control parameters: main

## Available control parameters (with default values):
## layout    =  stress
## circular  =  FALSE
## ggraphdots    =  NULL
## edges     =  <environment>
## nodes     =  <environment>
## nodetext  =  <environment>
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## engine    =  ggplot2
## max   =  100
## verbose   =  FALSE

## Warning: Too many rules supplied. Only plotting the best 100 using 'lift'
## (change control parameter max if needed).

The visualization illustrates that baking soda, eggs, flour, and salt are centrally positioned in the graph. This implies a common occurrence of these ingredients, suggesting a prevalent use in the preparation of quickly cooked dishes such as omelets.

plot(rules, method = "two-key plot")

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

The order represents number of ingredients in the both side of the rules. 2 size orders are rare rather than other size rules, however their support level is higher.

plot(rules, method = "matrix", engine = "htmlwidget")

## Warning: Too many rules supplied. Only plotting the best 1000 using 'lift'
## (change control parameter max if needed).

Parallel coordinates plots are designed to visualize multidimensional data where each dimension is displayed separately on the x-axis and the y-axis is shared. Each data point is represented by a line connecting the values for each dimension. We make a plot top 50 rules by lift measure decending order.

# top 50 rules by lift measure decending order
rules.top50 <- sort(rules, by="lift", decreasing=TRUE)[1:50]

plot(rules.top50, method="paracoord")

Parameters tuning

Delve into the Apriori algorithm to explore its intricacies and elucidate the behaviors of rules by leveraging appearance and control parameters.

We will elevate the support level to 0.009 and confidence to 0.7. Our objective is to identify frequent itemsets where the right-hand side is “butter” .

itemset <- apriori(trans, parameter=list(support=0.009,confidence=0.7,
                minlen=2, maxlen=10, target="frequent itemsets"), 
                appearance=list(default="lhs", rhs="butter"), 
                control=list(sort=0, verbose=TRUE))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##          NA    0.1    1 none FALSE            TRUE       5   0.009      2
##  maxlen            target  ext
##      10 frequent itemsets TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    0    TRUE
## 
## Absolute minimum support count: 180 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[4254 item(s), 20000 transaction(s)] done [0.05s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 5 done [0.02s].
## sorting transactions ... done [0.00s].
## writing ... [598 set(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

summary(itemset)

## set of 598 itemsets
## 
## most frequent items:
##    salt   sugar  butter   flour    eggs (Other) 
##     224     150     145     127     124     837 
## 
## element (itemset/transaction) length distribution:sizes
##   2   3   4   5 
## 287 218  86   7 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   2.000   3.000   2.687   3.000   5.000 
## 
## summary of quality measures:
##     support            count       
##  Min.   :0.00900   Min.   : 180.0  
##  1st Qu.:0.01080   1st Qu.: 216.0  
##  Median :0.01362   Median : 272.5  
##  Mean   :0.01839   Mean   : 367.9  
##  3rd Qu.:0.01974   3rd Qu.: 394.8  
##  Max.   :0.11970   Max.   :2394.0  
## 
## includes transaction ID lists: FALSE 
## 
## mining info:
##   data ntransactions support confidence
##  trans         20000   0.009          1
##                                                                                                                                                                                                                             call
##  apriori(data = trans, parameter = list(support = 0.009, confidence = 0.7, minlen = 2, maxlen = 10, target = "frequent itemsets"), appearance = list(default = "lhs", rhs = "butter"), control = list(sort = 0, verbose = TRUE))

The Apriori algorithm has discovered a total of 598 itemsets where the appearance of “butter” is on the right-hand side. Check the first 10 rules.

inspect(head(itemset, n=10))

##      items                  support count
## [1]  {butter, lemon juice}  0.01865  373 
## [2]  {butter, eggs}         0.07335 1467 
## [3]  {butter, garlic}       0.02100  420 
## [4]  {butter, milk}         0.06305 1261 
## [5]  {butter, onion}        0.04315  863 
## [6]  {butter, onions}       0.01330  266 
## [7]  {butter, raisins}      0.00915  183 
## [8]  {butter, salt}         0.11970 2394 
## [9]  {butter, sugar}        0.08495 1699 
## [10] {black pepper, butter} 0.01105  221

# make itemset graph plot to understand frequent itemsets

plot(itemset, method = "graph", reorder=TRUE)

## Warning: Unknown control parameters: reorder

## Available control parameters (with default values):
## layout    =  stress
## circular  =  FALSE
## ggraphdots    =  NULL
## edges     =  <environment>
## nodes     =  <environment>
## nodetext  =  <environment>
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## engine    =  ggplot2
## max   =  100
## verbose   =  FALSE

## Warning: Too many itemsets supplied. Only plotting the best 100 using 'support'
## (change control parameter max if needed).

Salt consistently emerges as a prevailing ingredient, demonstrating enduring prominence. Furthermore, both sugar and butter exhibit substantial influence in culinary compositions. In contrast, pungent spices such as garlic, pepper, and onion are comparatively less utilized when juxtaposed with other ingredients in the dataset.

# now discover itemset rules by defining target = rules

rules_butter <- apriori(trans, parameter=list(support=0.009,confidence=0.5,
                minlen=2, maxlen=15, target="rules"), 
                appearance=list(default="lhs", rhs="butter"), 
                control=list(sort=0, verbose=TRUE))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5   0.009      2
##  maxlen target  ext
##      15  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    0    TRUE
## 
## Absolute minimum support count: 180 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[4254 item(s), 20000 transaction(s)] done [0.05s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 5 done [0.02s].
## writing ... [65 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

summary(rules_butter)

## set of 65 rules
## 
## rule length distribution (lhs + rhs):sizes
##  2  3  4  5 
##  6 38 20  1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   3.000   3.000   3.246   4.000   5.000 
## 
## summary of quality measures:
##     support          confidence        coverage            lift      
##  Min.   :0.00900   Min.   :0.5000   Min.   :0.01300   Min.   :1.869  
##  1st Qu.:0.01060   1st Qu.:0.5160   1st Qu.:0.01890   1st Qu.:1.929  
##  Median :0.01305   Median :0.5318   Median :0.02345   Median :1.988  
##  Mean   :0.01705   Mean   :0.5610   Mean   :0.03098   Mean   :2.097  
##  3rd Qu.:0.01810   3rd Qu.:0.5961   3rd Qu.:0.03220   3rd Qu.:2.228  
##  Max.   :0.07695   Max.   :0.7654   Max.   :0.14885   Max.   :2.861  
##      count       
##  Min.   : 180.0  
##  1st Qu.: 212.0  
##  Median : 261.0  
##  Mean   : 341.1  
##  3rd Qu.: 362.0  
##  Max.   :1539.0  
## 
## mining info:
##   data ntransactions support confidence
##  trans         20000   0.009        0.5
##                                                                                                                                                                                                                 call
##  apriori(data = trans, parameter = list(support = 0.009, confidence = 0.5, minlen = 2, maxlen = 15, target = "rules"), appearance = list(default = "lhs", rhs = "butter"), control = list(sort = 0, verbose = TRUE))

Transactions has 65 rules which butter appears on right hand side

# make itemset graph plot to understand frequent itemsets

plot(rules_butter, method = "graph", reorder=TRUE)

## Warning: Unknown control parameters: reorder

## Available control parameters (with default values):
## layout    =  stress
## circular  =  FALSE
## ggraphdots    =  NULL
## edges     =  <environment>
## nodes     =  <environment>
## nodetext  =  <environment>
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## engine    =  ggplot2
## max   =  100
## verbose   =  FALSE

# make graph plot interactive way
plot(rules_butter, method = "graph", reorder=TRUE, engine = "htmlwidget")

## Warning: Unknown control parameters: reorder

## Available control parameters (with default values):
## itemCol   =  #CBD2FC
## nodeCol   =  c("#EE0000", "#EE0303", "#EE0606", "#EE0909", "#EE0C0C", "#EE0F0F", "#EE1212", "#EE1515", "#EE1818", "#EE1B1B", "#EE1E1E", "#EE2222", "#EE2525", "#EE2828", "#EE2B2B", "#EE2E2E", "#EE3131", "#EE3434", "#EE3737", "#EE3A3A", "#EE3D3D", "#EE4040", "#EE4444", "#EE4747", "#EE4A4A", "#EE4D4D", "#EE5050", "#EE5353", "#EE5656", "#EE5959", "#EE5C5C", "#EE5F5F", "#EE6262", "#EE6666", "#EE6969", "#EE6C6C", "#EE6F6F", "#EE7272", "#EE7575", "#EE7878", "#EE7B7B", "#EE7E7E", "#EE8181", "#EE8484", "#EE8888", "#EE8B8B",  "#EE8E8E", "#EE9191", "#EE9494", "#EE9797", "#EE9999", "#EE9B9B", "#EE9D9D", "#EE9F9F", "#EEA0A0", "#EEA2A2", "#EEA4A4", "#EEA5A5", "#EEA7A7", "#EEA9A9", "#EEABAB", "#EEACAC", "#EEAEAE", "#EEB0B0", "#EEB1B1", "#EEB3B3", "#EEB5B5", "#EEB7B7", "#EEB8B8", "#EEBABA", "#EEBCBC", "#EEBDBD", "#EEBFBF", "#EEC1C1", "#EEC3C3", "#EEC4C4", "#EEC6C6", "#EEC8C8", "#EEC9C9", "#EECBCB", "#EECDCD", "#EECFCF", "#EED0D0", "#EED2D2", "#EED4D4", "#EED5D5", "#EED7D7", "#EED9D9", "#EEDBDB", "#EEDCDC", "#EEDEDE", "#EEE0E0",  "#EEE1E1", "#EEE3E3", "#EEE5E5", "#EEE7E7", "#EEE8E8", "#EEEAEA", "#EEECEC", "#EEEEEE")
## precision     =  3
## igraphLayout  =  layout_nicely
## interactive   =  TRUE
## engine    =  visNetwork
## max   =  100
## selection_menu    =  TRUE
## degree_highlight  =  1
## verbose   =  FALSE

Eclat Algorithm Overview

The Eclat (Equivalence Class Clustering and bottom-up Lattice Traversal) algorithm is a frequent itemset mining algorithm used to discover associations in transactional datasets. Unlike Apriori, Eclat employs a vertical layout for data representation, focusing on transactions rather than itemsets. It efficiently explores the dataset’s intersection to find frequent itemsets.

Some parameters explanation

support: Specifies the minimum support for an itemset to be considered frequent. It is a value between 0 and 1.
maxlen: Specifies the maximum length of itemsets to be mined. It can be useful to limit the search space.
target: Specifies the target type of the mining task. It can be “frequent itemsets” or “maximal itemsets”.
appearance: A list of appearance constraints to enforce.

# apply Eclat algorithm
freq_items <- eclat(trans, parameter=list(support=0.009, minlen=2, maxlen=15))

## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE   0.009      2     15 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 180 
## 
## create itemset ... 
## set transactions ...[4254 item(s), 20000 transaction(s)] done [0.07s].
## sorting and recoding items ... [138 item(s)] done [0.00s].
## creating sparse bit matrix ... [138 row(s), 20000 column(s)] done [0.00s].
## writing  ... [598 set(s)] done [0.04s].
## Creating S4 object  ... done [0.00s].

# show summary statistics of frequent itemset
summary(freq_items)

## set of 598 itemsets
## 
## most frequent items:
##    salt   sugar  butter   flour    eggs (Other) 
##     224     150     145     127     124     837 
## 
## element (itemset/transaction) length distribution:sizes
##   2   3   4   5 
## 287 218  86   7 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   2.000   3.000   2.687   3.000   5.000 
## 
## summary of quality measures:
##     support            count       
##  Min.   :0.00900   Min.   : 180.0  
##  1st Qu.:0.01080   1st Qu.: 216.0  
##  Median :0.01362   Median : 272.5  
##  Mean   :0.01839   Mean   : 367.9  
##  3rd Qu.:0.01974   3rd Qu.: 394.8  
##  Max.   :0.11970   Max.   :2394.0  
## 
## includes transaction ID lists: FALSE 
## 
## mining info:
##   data ntransactions support
##  trans         20000   0.009
##                                                                             call
##  eclat(data = trans, parameter = list(support = 0.009, minlen = 2, maxlen = 15))

The Eclat algorithm successfully identifies 598 frequent itemsets, most common size 2. Subsequently, utilizing these frequent itemsets, we proceed to establish association rules for the transactions in the dataset.

## Create rules from the itemsets
rules_eclat <- ruleInduction(freq_items, trans, confidence = .6)
summary(rules_eclat)

## set of 289 rules
## 
## rule length distribution (lhs + rhs):sizes
##   2   3   4   5 
##  17 122 128  22 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   3.000   4.000   3.536   4.000   5.000 
## 
## summary of quality measures:
##     support          confidence          lift           itemset     
##  Min.   :0.00900   Min.   :0.6011   Min.   : 1.549   Min.   :  1.0  
##  1st Qu.:0.01040   1st Qu.:0.6426   1st Qu.: 1.937   1st Qu.:227.0  
##  Median :0.01285   Median :0.6908   Median : 2.554   Median :323.0  
##  Mean   :0.01619   Mean   :0.6990   Mean   : 2.805   Mean   :315.3  
##  3rd Qu.:0.01735   3rd Qu.:0.7506   3rd Qu.: 3.203   3rd Qu.:422.0  
##  Max.   :0.09060   Max.   :0.8511   Max.   :12.097   Max.   :583.0  
## 
## mining info:
##   data ntransactions support
##  trans         20000   0.009
##                                                                             call
##  eclat(data = trans, parameter = list(support = 0.009, minlen = 2, maxlen = 15))
##  confidence
##         0.6

inspect(tail(rules_eclat))

##     lhs                       rhs    support confidence lift     itemset
## [1] {flour, sugar, water}  => {salt} 0.00975 0.7358491  1.894565 544    
## [2] {flour, water}         => {salt} 0.01870 0.6666667  1.716444 545    
## [3] {butter, flour, sugar} => {salt} 0.02265 0.6088710  1.567639 550    
## [4] {flour, sugar}         => {salt} 0.04685 0.6697641  1.724418 551    
## [5] {flour}                => {salt} 0.09060 0.6086664  1.567112 554    
## [6] {butter, sugar, water} => {salt} 0.00920 0.6133333  1.579128 583

There are a total of 289 rules that align with our identified frequent itemsets. Create paracoordinal plot by using rules_eclat

plot(rules_eclat[1:50], method="paracoord")

Most frequent ingredients which appear on right hand side are butter, flour, eggs and salt. Meanwhile spicies such as garlic, onion appeared once.

plot(rules_eclat, method = "graph", reorder=TRUE)

## Warning: Unknown control parameters: reorder

## Available control parameters (with default values):
## layout    =  stress
## circular  =  FALSE
## ggraphdots    =  NULL
## edges     =  <environment>
## nodes     =  <environment>
## nodetext  =  <environment>
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## engine    =  ggplot2
## max   =  100
## verbose   =  FALSE

## Warning: Too many rules supplied. Only plotting the best 100 using 'lift'
## (change control parameter max if needed).

WEclat

The weclat algorithm, standing for Weighted Eclat, is an extension of the traditional Eclat algorithm designed to handle weighted transaction data. The primary difference lies in its ability to perform weighted association rule mining (WARM) by considering transaction weights in the process.

weight <- sample(1:40 + 1:10, size = 20000, replace  = TRUE)

## add weight information
transactionInfo(trans) <- data.frame(weight = weight)
inspect(trans[1:5])

##     items                      weight
## [1] {blueberries,                    
##      granulated sugar,               
##      lemon juice,                    
##      vanilla yogurt}               10
## [2] {basmati rice,                   
##      boneless chicken,               
##      cardamom seed,                  
##      cashews,                        
##      cilantro,                       
##      clove,                          
##      cumin seed,                     
##      eggs,                           
##      fresh lemon juice,              
##      garlic,                         
##      ghee,                           
##      hot green chili peppers,        
##      long-grain rice,                
##      mace,                           
##      milk,                           
##      mint leaf,                      
##      onion,                          
##      onions,                         
##      peppercorns,                    
##      plain yogurt,                   
##      poppy seed,                     
##      raisins,                        
##      saffron,                        
##      salt,                           
##      tomatoes}                     18
## [3] {fresh lemon juice,              
##      fresh water,                    
##      lemon,                          
##      lemons,                         
##      rind of,                        
##      sugar,                          
##      zest of}                      38
## [4] {black pepper,                   
##      eggplant,                       
##      extra firm tofu,                
##      garlic cloves,                  
##      honey,                          
##      lemon juice,                    
##      low sodium soy sauce,           
##      maple syrup,                    
##      mushrooms,                      
##      mustard powder,                 
##      olive oil,                      
##      red wine vinegar,               
##      soy sauce,                      
##      zucchini}                     10
## [5] {cabbage,                        
##      carrots,                        
##      celery,                         
##      onion,                          
##      plain tomato juice}           16

# apply weclat algorithm with same Eclat parameters 
weclat_itemset <- weclat(trans, 
                         parameter = list(support=0.009, minlen=2, maxlen=15),
                        control = list(verbose = TRUE))

## Weighted Eclat (WEclat)
## 
## parameter specification:
##  support minlen maxlen target ext
##    0.009      2     15   <NA>  NA
## 
## algorithmic control:
##  sort verbose
##    NA    TRUE

summary(weclat_itemset)

## set of 594 itemsets
## 
## most frequent items:
##    salt   sugar  butter    eggs   flour (Other) 
##     216     148     142     125     123     832 
## 
## element (itemset/transaction) length distribution:sizes
##   2   3   4   5 
## 290 216  82   6 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00    2.00    3.00    2.67    3.00    5.00 
## 
## summary of quality measures:
##     support        
##  Min.   :0.009007  
##  1st Qu.:0.010885  
##  Median :0.013637  
##  Mean   :0.018440  
##  3rd Qu.:0.019825  
##  Max.   :0.118469  
## 
## includes transaction ID lists: FALSE 
## 
## mining info:
##   data ntransactions support
##  trans         20000   0.009

Weclat algorithm finds 577 common itemsests that are less than eclat

## create association rules
weclat_rules <- ruleInduction(weclat_itemset, trans, confidence = .6)

summary(weclat_rules)

## set of 277 rules
## 
## rule length distribution (lhs + rhs):sizes
##   2   3   4   5 
##  17 120 122  18 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   3.000   4.000   3.509   4.000   5.000 
## 
## summary of quality measures:
##     support          confidence          lift           itemset     
##  Min.   :0.00895   Min.   :0.6011   Min.   : 1.549   Min.   :  1.0  
##  1st Qu.:0.01080   1st Qu.:0.6453   1st Qu.: 1.937   1st Qu.:222.0  
##  Median :0.01305   Median :0.6891   Median : 2.554   Median :316.0  
##  Mean   :0.01650   Mean   :0.6987   Mean   : 2.807   Mean   :309.8  
##  3rd Qu.:0.01855   3rd Qu.:0.7506   3rd Qu.: 3.203   3rd Qu.:396.0  
##  Max.   :0.09060   Max.   :0.8511   Max.   :12.097   Max.   :582.0  
## 
## mining info:
##   data ntransactions support confidence
##  trans         20000   0.009        0.6

inspect(weclat_rules[1:10])

##      lhs                          rhs           support confidence lift     
## [1]  {white pepper}            => {salt}        0.00990 0.6947368   1.788715
## [2]  {buttermilk}              => {baking soda} 0.01135 0.6878788  10.910052
## [3]  {buttermilk, salt}        => {baking soda} 0.00900 0.7627119  12.096937
## [4]  {baking soda, buttermilk} => {salt}        0.00900 0.7929515   2.041585
## [5]  {buttermilk}              => {salt}        0.01180 0.7151515   1.841276
## [6]  {salt, shortening}        => {flour}       0.01040 0.6246246   4.196336
## [7]  {flour, shortening}       => {salt}        0.01040 0.8320000   2.142122
## [8]  {shortening, sugar}       => {salt}        0.01005 0.8375000   2.156282
## [9]  {salt, shortening}        => {sugar}       0.01005 0.6036036   2.485500
## [10] {shortening}              => {salt}        0.01665 0.7985612   2.056028
##      itemset
## [1]   1     
## [2]   3     
## [3]   4     
## [4]   4     
## [5]   6     
## [6]  13     
## [7]  13     
## [8]  15     
## [9]  15     
## [10] 16

plot(weclat_rules[1:50], method="paracoord")

There is higher confidence and lift between salt, eggs, cinnamon and baking soda. We can’t find this higher lift in eclat algorithm.

plot(weclat_rules, method = "graph", reorder=TRUE)

## Warning: Unknown control parameters: reorder

## Available control parameters (with default values):
## layout    =  stress
## circular  =  FALSE
## ggraphdots    =  NULL
## edges     =  <environment>
## nodes     =  <environment>
## nodetext  =  <environment>
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## engine    =  ggplot2
## max   =  100
## verbose   =  FALSE

## Warning: Too many rules supplied. Only plotting the best 100 using 'lift'
## (change control parameter max if needed).

Conclusion

In summary, our exploration of association rule mining in a food dataset, employing Apriori, Eclat, and Weclat algorithms, has revealed meaningful patterns in ingredient combinations. Apriori laid the groundwork for frequent itemset identification, Eclat efficiently handled large datasets, and Weclat extended the analysis to weighted transaction data. The findings underscore prevalent ingredient associations, offering valuable insights into culinary compositions and showcasing the versatility of association rule mining in deciphering intricate food-related datasets.

Food recipe - Association rules

Samidullo Abdullaev

2023-12-28

Food recipe - Association rules

Introduction

Aim of the Study

About dataset

Preprocessing

Apriori Algorithm

Visualizations

Parameters tuning

Eclat Algorithm Overview

Some parameters explanation

WEclat

Conclusion