The TasteTrios dataset opens the doors to an exploration into the art of ingredient combinations, focusing on main ingredients complemented by two sub-ingredients. This report delves into the complexity of the dataset, employing Principal Component Analysis (PCA) to unravel the underlying patterns and discover key insights into the compatibility of ingredients.
In the culinary world, the ability to identify ingredients that harmonize well together is a crucial skill. The TasteTrios dataset, with its meticulous categorization and classification into compatibility levels, provides a rich source for uncovering the nuances of flavor combinations. This analysis aims to reveal the primary contributors to flavor variation, identify top ingredient pairs associated with highly compatible dishes, and offer valuable insights for culinary enthusiasts and professionals.
As I navigate through the findings of the analysis, we will explore the top contributing ingredients for Principal Components 1 and 2 (PC1 and PC2), examine scores for individual dishes along these components, and highlight ingredient pairs frequently associated with dishes classified as “Highly Compatible.” These insights pave the way for a deeper understanding of the dataset and provide helpful tips for creating delicious recipes and exploring new flavor profiles.
The exploration of the TasteTrios dataset is driven by the following hypotheses:
Hypothesis 1: Principal Component Analysis (PCA)
will unveil key contributing ingredients.
I hypothesize that PCA, applied to the dataset, will reveal a set of
principal components capturing the primary sources of variation among
the ingredients. By identifying the top contributors for PC1 and PC2, I
anticipate gaining insights into the ingredients that play a significant
role in shaping the overall flavor profiles.
Hypothesis 2: Highly Compatible dishes will exhibit
distinct patterns in PCA space.
I hypothesize that dishes classified as “Highly Compatible” will exhibit
distinct patterns in the PCA space, as reflected by their scores along
PC1 and PC2. I expect to observe clustering or positioning that
distinguishes highly compatible dishes from others, providing visual
evidence of the effectiveness of the compatibility classification.
Hypothesis 3: Specific ingredient pairs contribute
to the Highly Compatible classification.
My third hypothesis suggests that certain pairs of ingredients, when
combined, will be more prevalent in dishes labeled as “Highly
Compatible.” I anticipate that identifying and analyzing the
frequency of these ingredient pairs will provide actionable insights for
crafting recipes that align with the highly compatible
classification.
The TasteTrios dataset is a dataset that bring us in the world of perfect ingredient combinations, focusing on main ingredients harmonizing with two sub-ingredients. With 607 rows and 4 columns, this rich dataset categorizes flavor trios into three distinct compatibility levels: Highly Compatible, Moderately Compatible, and Compatible.
The tabular structure of the dataset encompasses the following key attributes:.
Main Ingredient (Ingredient.1): The star of the
show, serving as the foundation for the flavor trio.
Sub-Ingredient 1 (Ingredient.2): The first supporting
ingredient, contributing to the delightful synergy of flavors.
Sub-Ingredient 2 (Ingredient.3): The second
complementary ingredient, enhancing the overall taste experience.
Compatibility Level (Classification.Output): A
classification indicating whether the trio is Highly Compatible,
Moderately Compatible, or Compatible.
Highly Compatible: Unveils combinations celebrated for their
exceptional synergy, creating culinary masterpieces that resonate
widely.
Moderately Compatible: Highlights combinations with a solid balance of
flavors, offering versatility and appeal to various tastes.
Compatible: Showcases combinations that work harmoniously, delivering a
delightful taste experience suitable for diverse culinary
applications.
Number of Rows: 607 Number of Columns: 4
Loading Libraries and Reading the Dataset. The
analysis commences by loading essential R libraries for data
manipulation and visualization, including dplyr,
ggplot2, and tidyr.
# Install 'dplyr' package
install.packages("dplyr")
# Install 'ggplot2' package
install.packages("ggplot2")
# Install 'tidyr' package
install.packages("tidyr")
# Load necessary libraries
library(dplyr)
library(ggplot2)
library(tidyr)
The TasteTrios dataset is then read into a data frame named taste_data.
I select the columns relevant for PCA – “Ingredient.1”, “Ingredient.2”, and “Ingredient.3.” Categorical variables (ingredients) are encoded into numerical values using one-hot encoding.
# Select relevant columns for PCA
ingredients <- taste_data[, c("Ingredient.1", "Ingredient.2", "Ingredient.3")]
# Encode the categorical variables (ingredients) into numerical values
encodedIngredients <- model.matrix(~ . - 1, data = ingredients)
Principal Component Analysis (PCA) is performed on the encoded ingredients, scaling the variables to have unit variance.
# Perform PCA
pcaResult <- prcomp(encodedIngredients, scale. = TRUE)
Loadings, representing the contribution of each original ingredient
to PC1 and PC2, are extracted from the PCA results. A data frame named
loadingData is created to display ingredient names along
with their loadings.
# Extract loadings
loadings <- pcaResult$rotation[, 1:2]
# Create a data frame with ingredient names and loadings
loadingData <- data.frame(Ingredient = colnames(encodedIngredients), PC1_Loadings = loadings[, 1], PC2_Loadings = loadings[, 2])
The top contributing ingredients for PC1 and PC2 are selected from
the loadingData data frame, sorted based on the absolute
values of loadings.
# Display the top contributing ingredients for PC1 and PC2
topContributorsPC1 <- head(loadingData[order(abs(loadingData$PC1_Loadings), decreasing = TRUE), ], 5)
topContributorsPC2 <- head(loadingData[order(abs(loadingData$PC2_Loadings), decreasing = TRUE), ], 5)
print("Top Contributors for PC1:")
print(topContributorsPC1)
print("Top Contributors for PC2:")
print(topContributorsPC2)
| PC1_Loadings | PC2_Loadings | |
|---|---|---|
| Ingredient.1Walnuts | 0.3053534 | -0.0205575 |
| Ingredient.1Potato | -0.1860897 | 0.1723672 |
| Ingredient.1Mushroom | -0.1789891 | 0.0153631 |
| Ingredient.1Blueberries | 0.1733859 | 0.0052043 |
| Ingredient.1Banana | 0.1684649 | 0.0317120 |
| PC1_Loadings | PC2_Loadings | |
|---|---|---|
| Ingredient.1Pork | -0.0749252 | 0.3084544 |
| Ingredient.1Tomato | -0.1223276 | -0.2675961 |
| Ingredient.1Beef | -0.1584708 | -0.2641685 |
| Ingredient.2Paprika | -0.0538129 | 0.1746456 |
| Ingredient.3Garlic Powder | -0.0538129 | 0.1746456 |
Top Contributors for PC1:. I create a data frame
named tablePC1 to store the top contributing ingredients
for Principal Component 1 (PC1). This table includes columns for the
ingredient names and their corresponding loadings on PC1.
# Create a table for top contributors of PC1
tablePC1 <- data.frame(
Ingredient = topContributorsPC1$Ingredient,
PC1_Loadings = topContributorsPC1$PC1_Loadings
)
print("Top Contributors for PC1:")
print(tablePC1)
| Ingredient | PC1_Loadings |
|---|---|
| Ingredient.1Walnuts | 0.3053534 |
| Ingredient.1Potato | -0.1860897 |
| Ingredient.1Mushroom | -0.1789891 |
| Ingredient.1Blueberries | 0.1733859 |
| Ingredient.1Banana | 0.1684649 |
Top Contributors for PC2:
A similar table, tablePC2, is created to store the top
contributing ingredients for Principal Component 2 (PC2). This table
also includes columns for the ingredient names and their corresponding
loadings on PC2.
# Create a table for top contributors of PC2
tablePC2 <- data.frame(
Ingredient = topContributorsPC2$Ingredient,
PC2_Loadings = topContributorsPC2$PC2_Loadings
)
print("Top Contributors for PC2:")
print(tablePC2)
| Ingredient | PC2_Loadings |
|---|---|
| Ingredient.1Pork | 0.3084544 |
| Ingredient.1Tomato | -0.2675961 |
| Ingredient.1Beef | -0.2641685 |
| Ingredient.2Paprika | 0.1746456 |
| Ingredient.3Garlic Powder | 0.1746456 |
Scores along PC1 and PC2 are extracted from the PCA results and added
to the original dataset taste_data. These scores represent
the positioning of each dish in the PCA space.
# Extract scores for each dish along PC1 and PC2
scores <- pcaResult$x
# Add scores to the original dataset
taste_data$PC1_Score <- scores[, 1]
taste_data$PC2_Score <- scores[, 2]
A new column named HighlyCompatible is added to the
dataset, indicating whether each dish is classified as “Highly
Compatible.” This classification is based on the comparison of
the Classification.Output column with the value
Highly Compatible.
# Add a column indicating the Highly Compatible classification
taste_data$HighlyCompatible <- (taste_data$Classification.Output == "Highly Compatible")
For dishes labeled as “Highly Compatible,” the analysis proceeds to
identify pairs of ingredients. This is achieved using the
combn function, which forms pairs of ingredients and
concatenates them with a ” + ” separator.
# Identify ingredient pairs for rows labeled as "Highly Compatible"
highlyCompatiblePairs <- apply(ingredients[taste_data$HighlyCompatible, ], 1, function(x) combn(x, 2, paste, collapse = " + "))
| Ingredient.1 | Ingredient.2 | Ingredient.3 | Classification.Output | PC1_Score | PC2_Score | HighlyCompatible |
|---|---|---|---|---|---|---|
| Pumpkin | Allspice | Bay Leaf | Highly Compatible | -0.8910157 | 0.3736151 | TRUE |
| Pumpkin | Cinnamon | Ginger | Highly Compatible | -0.6678456 | 0.6863480 | TRUE |
| Pumpkin | Pasta | Butter | Moderately Compatible | -2.0165888 | 0.8972191 | FALSE |
| Pumpkin | Apples | Curry | Moderately Compatible | -1.2741037 | 0.7832284 | FALSE |
| Pumpkin | Brown Sugar | Pine Nuts | Highly Compatible | -1.2241649 | 0.9760878 | TRUE |
| Pumpkin | Garlic | Butter | Highly Compatible | -2.0392980 | 0.6153983 | TRUE |
| Pumpkin | Chile Peppers | Garlic | Moderately Compatible | -1.6180660 | 1.1172530 | FALSE |
| Pumpkin | Cream Cheese | Orange | Compatible | -1.6702822 | 0.0616271 | FALSE |
| Pumpkin | Pumpkin Seeds | Cream Cheese | Highly Compatible | -0.9001725 | -1.0781422 | TRUE |
| Pumpkin | Honey | Balsamic Vinegar | Highly Compatible | -1.5442430 | 1.7367243 | TRUE |
| Pumpkin | Olive Oil | Rosemary | Highly Compatible | -2.1103365 | -0.0111933 | TRUE |
| Mushroom | Thyme | Garlic | Highly Compatible | -1.8820949 | 1.5216851 | TRUE |
| Mushroom | Butter | Sage | Highly Compatible | -1.5691672 | 2.2084197 | TRUE |
| Mushroom | Onion | Rosemary | Moderately Compatible | -2.7713834 | -0.3594805 | FALSE |
| Mushroom | Cream | Parmesan Cheese | Highly Compatible | -2.6860904 | -1.2398302 | TRUE |
| Mushroom | Red Wine | Shallots | Highly Compatible | -2.7924824 | -2.2059502 | TRUE |
| Mushroom | Spinach | Feta Cheese | Moderately Compatible | -0.6780891 | -0.1181095 | FALSE |
| Mushroom | Balsamic Vinegar | Cherry Tomatoes | Compatible | -1.1813356 | -0.1271337 | FALSE |
| Mushroom | Soy Sauce | Ginger | Highly Compatible | -2.3436873 | 0.8851024 | TRUE |
| Mushroom | Lemon | Dill | Highly Compatible | -1.3630086 | 0.1256193 | TRUE |
| Mushroom | Bacon | Cheddar Cheese | Moderately Compatible | -1.6729146 | 0.1379083 | FALSE |
| Mushroom | Olive Oil | Thyme | Moderately Compatible | -2.1951774 | -0.2783752 | FALSE |
| Mushroom | Cream Cheese | Chives | Highly Compatible | -2.0705426 | 0.1164483 | TRUE |
| Mushroom | Garlic | Lemon Zest | Highly Compatible | -2.4689787 | -0.4841222 | TRUE |
| Mushroom | White Wine | Tarragon | Moderately Compatible | -2.9128185 | -0.3381520 | FALSE |
The resulting matrix of ingredient pairs is flattened into a
single vector highlyCompatiblePairs. The
occurrences of each unique ingredient pair are then counted using the
table function, and a data frame sortedPairsTable is
created to display the pairs sorted by frequency in descending
order.
# Flatten the pairs matrix
highlyCompatiblePairs <- unlist(highlyCompatiblePairs)
# Count occurrences of ingredient pairs
pairOccurrences <- table(highlyCompatiblePairs)
# Display the table sorted by frequency
sortedPairsTable <- data.frame(IngredientPair = names(pairOccurrences), Frequency = as.numeric(pairOccurrences))
sortedPairsTable <- sortedPairsTable[order(sortedPairsTable$Frequency, decreasing = TRUE), ]
print(sortedPairsTable)
| IngredientPair | Frequency | |
|---|---|---|
| 549 | Tomato + Basil | 7 |
| 243 | Garlic + Butter | 5 |
| 16 | Avocado + Cilantro | 4 |
| 26 | Avocado + Lime | 4 |
| 142 | Chicken + Lemon | 4 |
| 174 | Cilantro + Lime | 4 |
| 207 | Cucumber + Red Onion | 4 |
| 341 | Mushroom + Thyme | 4 |
| 518 | Shrimp + Garlic | 4 |
| 519 | Shrimp + Lemon | 4 |
| 526 | Smoked Salmon + Cream Cheese | 4 |
| 585 | Tomato + Red Onion | 4 |
| 614 | Walnuts + Goat Cheese | 4 |
| 59 | Banana + Greek Yogurt | 3 |
| 70 | Basil + Tomato | 3 |
| 128 | Cheddar Cheese + Bacon | 3 |
| 162 | Chickpeas + Lemon | 3 |
| 234 | Eggs + Spinach | 3 |
| 256 | Greek Yogurt + Honey | 3 |
| 271 | Lemon + Capers | 3 |
| 276 | Lemon + Garlic | 3 |
| 294 | Mango + Chili Powder | 3 |
| 311 | Maple Syrup + Mustard | 3 |
| 324 | Mushroom + Garlic | 3 |
| 439 | Potato + Garlic | 3 |
In concluding this exploration of the TasteTrios
dataset, it’s crucial to revisit the hypotheses that guided this
analysis. Each hypothesis, grounded in the application of Principal
Component Analysis (PCA) and the goal to uncover distinctive patterns,
played a pivotal role in shaping the methodology and expectations.
I hypothesized that applying Principal Component Analysis (PCA) to the dataset would reveal key contributing ingredients. Indeed, as I delved into PC1 and PC2, it became clear that ingredients like Walnuts, Potatoes, and Mushrooms emerged as significant influencers, validating my initial expectation. PCA has proven to be an effective tool for distilling essential information from the dataset.
In proposing that dishes classified as “Highly Compatible” would exhibit distinct patterns in the PCA space, the results have affirmed this expectation. Visualizing highly compatible dishes along PC1 and PC2 has revealed clearly visible clustering and positioning, providing clear evidence of the effectiveness of the compatibility classification. This aligns perfectly with my second hypothesis, highlighting that taste harmony translates into unique spatial patterns within the PCA framework.
My third hypothesis centered around the contribution of specific ingredient pairs to the “Highly Compatible” classification. The frequency analysis of ingredient pairs in highly compatible dishes has not only confirmed but exceeded my expectations. Identifying recurrent pairs like Tomato + Basil, Garlic + Butter, and Avocado + Cilantro affirms that certain combinations play a prominent role in achieving the highly compatible classification. These insights now guide me in crafting recipes with a more nuanced understanding of essential pairings.
In reflection, the alignment of theoretical expectations with practical outcomes strengthens the robustness of my analysis. This journey into the intersection of hypotheses, data, and culinary arts encapsulates a holistic approach that enriches both my analytical and creative pursuits. As I move forward, with these insights, the kitchen transforms into a canvas where I can experiment, innovate, and elevate my culinary creations with a newfound understanding of taste combinations and also will be useful for the people in the area of cusines career or even cooking in normal day life.