Introduction:

The TasteTrios dataset opens the doors to an exploration into the art of ingredient combinations, focusing on main ingredients complemented by two sub-ingredients. This report delves into the complexity of the dataset, employing Principal Component Analysis (PCA) to unravel the underlying patterns and discover key insights into the compatibility of ingredients.

In the culinary world, the ability to identify ingredients that harmonize well together is a crucial skill. The TasteTrios dataset, with its meticulous categorization and classification into compatibility levels, provides a rich source for uncovering the nuances of flavor combinations. This analysis aims to reveal the primary contributors to flavor variation, identify top ingredient pairs associated with highly compatible dishes, and offer valuable insights for culinary enthusiasts and professionals.

As I navigate through the findings of the analysis, we will explore the top contributing ingredients for Principal Components 1 and 2 (PC1 and PC2), examine scores for individual dishes along these components, and highlight ingredient pairs frequently associated with dishes classified as “Highly Compatible.” These insights pave the way for a deeper understanding of the dataset and provide helpful tips for creating delicious recipes and exploring new flavor profiles.

Hypothesis:

The exploration of the TasteTrios dataset is driven by the following hypotheses:

Hypothesis 1: Principal Component Analysis (PCA) will unveil key contributing ingredients.
I hypothesize that PCA, applied to the dataset, will reveal a set of principal components capturing the primary sources of variation among the ingredients. By identifying the top contributors for PC1 and PC2, I anticipate gaining insights into the ingredients that play a significant role in shaping the overall flavor profiles.

Hypothesis 2: Highly Compatible dishes will exhibit distinct patterns in PCA space.
I hypothesize that dishes classified as “Highly Compatible” will exhibit distinct patterns in the PCA space, as reflected by their scores along PC1 and PC2. I expect to observe clustering or positioning that distinguishes highly compatible dishes from others, providing visual evidence of the effectiveness of the compatibility classification.

Hypothesis 3: Specific ingredient pairs contribute to the Highly Compatible classification.
My third hypothesis suggests that certain pairs of ingredients, when combined, will be more prevalent in dishes labeled as “Highly Compatible.” I anticipate that identifying and analyzing the frequency of these ingredient pairs will provide actionable insights for crafting recipes that align with the highly compatible classification.

Dataset Overview:

Overview:

The TasteTrios dataset is a dataset that bring us in the world of perfect ingredient combinations, focusing on main ingredients harmonizing with two sub-ingredients. With 607 rows and 4 columns, this rich dataset categorizes flavor trios into three distinct compatibility levels: Highly Compatible, Moderately Compatible, and Compatible.

Dataset Structure:

The tabular structure of the dataset encompasses the following key attributes:.

Main Ingredient (Ingredient.1): The star of the show, serving as the foundation for the flavor trio.
Sub-Ingredient 1 (Ingredient.2): The first supporting ingredient, contributing to the delightful synergy of flavors.
Sub-Ingredient 2 (Ingredient.3): The second complementary ingredient, enhancing the overall taste experience.
Compatibility Level (Classification.Output): A classification indicating whether the trio is Highly Compatible, Moderately Compatible, or Compatible.

Compatibility Levels:

Highly Compatible: Unveils combinations celebrated for their exceptional synergy, creating culinary masterpieces that resonate widely.
Moderately Compatible: Highlights combinations with a solid balance of flavors, offering versatility and appeal to various tastes.
Compatible: Showcases combinations that work harmoniously, delivering a delightful taste experience suitable for diverse culinary applications.

Dataset Statistics:

Number of Rows: 607 Number of Columns: 4

PCA Analysis:

Loading Libraries and Reading the Dataset. The analysis commences by loading essential R libraries for data manipulation and visualization, including dplyr, ggplot2, and tidyr.

# Install 'dplyr' package
install.packages("dplyr")

# Install 'ggplot2' package
install.packages("ggplot2")

# Install 'tidyr' package
install.packages("tidyr")

# Load necessary libraries
library(dplyr)
library(ggplot2)
library(tidyr)

The TasteTrios dataset is then read into a data frame named taste_data.

Selecting Relevant Columns and Encoding Categorical Variables.

I select the columns relevant for PCA – “Ingredient.1”, “Ingredient.2”, and “Ingredient.3.” Categorical variables (ingredients) are encoded into numerical values using one-hot encoding.

# Select relevant columns for PCA
ingredients <- taste_data[, c("Ingredient.1", "Ingredient.2", "Ingredient.3")]

# Encode the categorical variables (ingredients) into numerical values
encodedIngredients <- model.matrix(~ . - 1, data = ingredients)

Performing PCA.

Principal Component Analysis (PCA) is performed on the encoded ingredients, scaling the variables to have unit variance.

# Perform PCA
pcaResult <- prcomp(encodedIngredients, scale. = TRUE)

Extracting Loadings and Creating a Data Frame.

Loadings, representing the contribution of each original ingredient to PC1 and PC2, are extracted from the PCA results. A data frame named loadingData is created to display ingredient names along with their loadings.

# Extract loadings
loadings <- pcaResult$rotation[, 1:2]

# Create a data frame with ingredient names and loadings
loadingData <- data.frame(Ingredient = colnames(encodedIngredients), PC1_Loadings = loadings[, 1], PC2_Loadings = loadings[, 2])

Displaying Top Contributing Ingredients for PC1 and PC2.

The top contributing ingredients for PC1 and PC2 are selected from the loadingData data frame, sorted based on the absolute values of loadings.

# Display the top contributing ingredients for PC1 and PC2
topContributorsPC1 <- head(loadingData[order(abs(loadingData$PC1_Loadings), decreasing = TRUE), ], 5)
topContributorsPC2 <- head(loadingData[order(abs(loadingData$PC2_Loadings), decreasing = TRUE), ], 5)

print("Top Contributors for PC1:")
print(topContributorsPC1)

print("Top Contributors for PC2:")
print(topContributorsPC2)
Top Contributors for PC1:
PC1_Loadings PC2_Loadings
Ingredient.1Walnuts 0.3053534 -0.0205575
Ingredient.1Potato -0.1860897 0.1723672
Ingredient.1Mushroom -0.1789891 0.0153631
Ingredient.1Blueberries 0.1733859 0.0052043
Ingredient.1Banana 0.1684649 0.0317120
Top Contributors for PC2:
PC1_Loadings PC2_Loadings
Ingredient.1Pork -0.0749252 0.3084544
Ingredient.1Tomato -0.1223276 -0.2675961
Ingredient.1Beef -0.1584708 -0.2641685
Ingredient.2Paprika -0.0538129 0.1746456
Ingredient.3Garlic Powder -0.0538129 0.1746456

Creating Tables for Top Contributors of PC1 and PC2:

Top Contributors for PC1:. I create a data frame named tablePC1 to store the top contributing ingredients for Principal Component 1 (PC1). This table includes columns for the ingredient names and their corresponding loadings on PC1.

# Create a table for top contributors of PC1
tablePC1 <- data.frame(
  Ingredient = topContributorsPC1$Ingredient,
  PC1_Loadings = topContributorsPC1$PC1_Loadings
)

print("Top Contributors for PC1:")
print(tablePC1)
Top Contributors for PC1:
Ingredient PC1_Loadings
Ingredient.1Walnuts 0.3053534
Ingredient.1Potato -0.1860897
Ingredient.1Mushroom -0.1789891
Ingredient.1Blueberries 0.1733859
Ingredient.1Banana 0.1684649

Top Contributors for PC2:
A similar table, tablePC2, is created to store the top contributing ingredients for Principal Component 2 (PC2). This table also includes columns for the ingredient names and their corresponding loadings on PC2.

# Create a table for top contributors of PC2
tablePC2 <- data.frame(
  Ingredient = topContributorsPC2$Ingredient,
  PC2_Loadings = topContributorsPC2$PC2_Loadings
)

print("Top Contributors for PC2:")
print(tablePC2)
Top Contributors for PC2:
Ingredient PC2_Loadings
Ingredient.1Pork 0.3084544
Ingredient.1Tomato -0.2675961
Ingredient.1Beef -0.2641685
Ingredient.2Paprika 0.1746456
Ingredient.3Garlic Powder 0.1746456

Extracting Scores and Modifying the Original Dataset:

Scores along PC1 and PC2 are extracted from the PCA results and added to the original dataset taste_data. These scores represent the positioning of each dish in the PCA space.

# Extract scores for each dish along PC1 and PC2
scores <- pcaResult$x

# Add scores to the original dataset
taste_data$PC1_Score <- scores[, 1]
taste_data$PC2_Score <- scores[, 2]

Adding Highly Compatible Classification:

A new column named HighlyCompatible is added to the dataset, indicating whether each dish is classified as “Highly Compatible.” This classification is based on the comparison of the Classification.Output column with the value Highly Compatible.

# Add a column indicating the Highly Compatible classification
taste_data$HighlyCompatible <- (taste_data$Classification.Output == "Highly Compatible")

Identifying Ingredient Pairs for Highly Compatible Dishes:

For dishes labeled as “Highly Compatible,” the analysis proceeds to identify pairs of ingredients. This is achieved using the combn function, which forms pairs of ingredients and concatenates them with a ” + ” separator.

# Identify ingredient pairs for rows labeled as "Highly Compatible"
highlyCompatiblePairs <- apply(ingredients[taste_data$HighlyCompatible, ], 1, function(x) combn(x, 2, paste, collapse = " + "))
taste_data added High Compatible column first 25 rows
Ingredient.1 Ingredient.2 Ingredient.3 Classification.Output PC1_Score PC2_Score HighlyCompatible
Pumpkin Allspice Bay Leaf Highly Compatible -0.8910157 0.3736151 TRUE
Pumpkin Cinnamon Ginger Highly Compatible -0.6678456 0.6863480 TRUE
Pumpkin Pasta Butter Moderately Compatible -2.0165888 0.8972191 FALSE
Pumpkin Apples Curry Moderately Compatible -1.2741037 0.7832284 FALSE
Pumpkin Brown Sugar Pine Nuts Highly Compatible -1.2241649 0.9760878 TRUE
Pumpkin Garlic Butter Highly Compatible -2.0392980 0.6153983 TRUE
Pumpkin Chile Peppers Garlic Moderately Compatible -1.6180660 1.1172530 FALSE
Pumpkin Cream Cheese Orange Compatible -1.6702822 0.0616271 FALSE
Pumpkin Pumpkin Seeds Cream Cheese Highly Compatible -0.9001725 -1.0781422 TRUE
Pumpkin Honey Balsamic Vinegar Highly Compatible -1.5442430 1.7367243 TRUE
Pumpkin Olive Oil Rosemary Highly Compatible -2.1103365 -0.0111933 TRUE
Mushroom Thyme Garlic Highly Compatible -1.8820949 1.5216851 TRUE
Mushroom Butter Sage Highly Compatible -1.5691672 2.2084197 TRUE
Mushroom Onion Rosemary Moderately Compatible -2.7713834 -0.3594805 FALSE
Mushroom Cream Parmesan Cheese Highly Compatible -2.6860904 -1.2398302 TRUE
Mushroom Red Wine Shallots Highly Compatible -2.7924824 -2.2059502 TRUE
Mushroom Spinach Feta Cheese Moderately Compatible -0.6780891 -0.1181095 FALSE
Mushroom Balsamic Vinegar Cherry Tomatoes Compatible -1.1813356 -0.1271337 FALSE
Mushroom Soy Sauce Ginger Highly Compatible -2.3436873 0.8851024 TRUE
Mushroom Lemon Dill Highly Compatible -1.3630086 0.1256193 TRUE
Mushroom Bacon Cheddar Cheese Moderately Compatible -1.6729146 0.1379083 FALSE
Mushroom Olive Oil Thyme Moderately Compatible -2.1951774 -0.2783752 FALSE
Mushroom Cream Cheese Chives Highly Compatible -2.0705426 0.1164483 TRUE
Mushroom Garlic Lemon Zest Highly Compatible -2.4689787 -0.4841222 TRUE
Mushroom White Wine Tarragon Moderately Compatible -2.9128185 -0.3381520 FALSE

Counting and Displaying Ingredient Pair Frequencies:

The resulting matrix of ingredient pairs is flattened into a single vector highlyCompatiblePairs. The occurrences of each unique ingredient pair are then counted using the table function, and a data frame sortedPairsTable is created to display the pairs sorted by frequency in descending order.

# Flatten the pairs matrix
highlyCompatiblePairs <- unlist(highlyCompatiblePairs)

# Count occurrences of ingredient pairs
pairOccurrences <- table(highlyCompatiblePairs)

# Display the table sorted by frequency
sortedPairsTable <- data.frame(IngredientPair = names(pairOccurrences), Frequency = as.numeric(pairOccurrences))
sortedPairsTable <- sortedPairsTable[order(sortedPairsTable$Frequency, decreasing = TRUE), ]

print(sortedPairsTable)
Combinaton of Ingredient
IngredientPair Frequency
549 Tomato + Basil 7
243 Garlic + Butter 5
16 Avocado + Cilantro 4
26 Avocado + Lime 4
142 Chicken + Lemon 4
174 Cilantro + Lime 4
207 Cucumber + Red Onion 4
341 Mushroom + Thyme 4
518 Shrimp + Garlic 4
519 Shrimp + Lemon 4
526 Smoked Salmon + Cream Cheese 4
585 Tomato + Red Onion 4
614 Walnuts + Goat Cheese 4
59 Banana + Greek Yogurt 3
70 Basil + Tomato 3
128 Cheddar Cheese + Bacon 3
162 Chickpeas + Lemon 3
234 Eggs + Spinach 3
256 Greek Yogurt + Honey 3
271 Lemon + Capers 3
276 Lemon + Garlic 3
294 Mango + Chili Powder 3
311 Maple Syrup + Mustard 3
324 Mushroom + Garlic 3
439 Potato + Garlic 3

Conclusion:

In concluding this exploration of the TasteTrios dataset, it’s crucial to revisit the hypotheses that guided this analysis. Each hypothesis, grounded in the application of Principal Component Analysis (PCA) and the goal to uncover distinctive patterns, played a pivotal role in shaping the methodology and expectations.

Hypothesis 1: Unveiling Key Contributors with PCA

I hypothesized that applying Principal Component Analysis (PCA) to the dataset would reveal key contributing ingredients. Indeed, as I delved into PC1 and PC2, it became clear that ingredients like Walnuts, Potatoes, and Mushrooms emerged as significant influencers, validating my initial expectation. PCA has proven to be an effective tool for distilling essential information from the dataset.

Hypothesis 2: Distinct Patterns in PCA Space for Highly Compatible Dishes.

In proposing that dishes classified as “Highly Compatible” would exhibit distinct patterns in the PCA space, the results have affirmed this expectation. Visualizing highly compatible dishes along PC1 and PC2 has revealed clearly visible clustering and positioning, providing clear evidence of the effectiveness of the compatibility classification. This aligns perfectly with my second hypothesis, highlighting that taste harmony translates into unique spatial patterns within the PCA framework.

Hypothesis 3: Contribution of Specific Ingredient Pairs.

My third hypothesis centered around the contribution of specific ingredient pairs to the “Highly Compatible” classification. The frequency analysis of ingredient pairs in highly compatible dishes has not only confirmed but exceeded my expectations. Identifying recurrent pairs like Tomato + Basil, Garlic + Butter, and Avocado + Cilantro affirms that certain combinations play a prominent role in achieving the highly compatible classification. These insights now guide me in crafting recipes with a more nuanced understanding of essential pairings.

In reflection, the alignment of theoretical expectations with practical outcomes strengthens the robustness of my analysis. This journey into the intersection of hypotheses, data, and culinary arts encapsulates a holistic approach that enriches both my analytical and creative pursuits. As I move forward, with these insights, the kitchen transforms into a canvas where I can experiment, innovate, and elevate my culinary creations with a newfound understanding of taste combinations and also will be useful for the people in the area of cusines career or even cooking in normal day life.