Introduction:

The TasteTrios dataset opens the doors to an exploration into the art of ingredient combinations, focusing on main ingredients complemented by two sub-ingredients. This report delves into the complexity of the dataset, employing Principal Component Analysis (PCA) to unravel the underlying patterns and discover key insights into the compatibility of ingredients.

In the culinary world, the ability to identify ingredients that harmonize well together is a crucial skill. The TasteTrios dataset, with its meticulous categorization and classification into compatibility levels, provides a rich source for uncovering the nuances of flavor combinations. This analysis aims to reveal the primary contributors to flavor variation, identify top ingredient pairs associated with highly compatible dishes, and offer valuable insights for culinary enthusiasts and professionals.

As I navigate through the findings of the analysis, we will explore the top contributing ingredients for Principal Components 1 and 2 (PC1 and PC2), examine scores for individual dishes along these components, and highlight ingredient pairs frequently associated with dishes classified as “Highly Compatible.” These insights pave the way for a deeper understanding of the dataset and provide helpful tips for creating delicious recipes and exploring new flavor profiles.

Hypothesis:

The exploration of the TasteTrios dataset is driven by the following hypotheses:

Hypothesis 1: Principal Component Analysis (PCA) will unveil key contributing ingredients.
I hypothesize that PCA, applied to the dataset, will reveal a set of principal components capturing the primary sources of variation among the ingredients. By identifying the top contributors for PC1 and PC2, I anticipate gaining insights into the ingredients that play a significant role in shaping the overall flavor profiles.

Hypothesis 2: Highly Compatible dishes will exhibit distinct patterns in PCA space.
I hypothesize that dishes classified as “Highly Compatible” will exhibit distinct patterns in the PCA space, as reflected by their scores along PC1 and PC2. I expect to observe clustering or positioning that distinguishes highly compatible dishes from others, providing visual evidence of the effectiveness of the compatibility classification.

Hypothesis 3: Specific ingredient pairs contribute to the Highly Compatible classification.
My third hypothesis suggests that certain pairs of ingredients, when combined, will be more prevalent in dishes labeled as “Highly Compatible.” I anticipate that identifying and analyzing the frequency of these ingredient pairs will provide actionable insights for crafting recipes that align with the highly compatible classification.

Dataset Overview:

Overview:

The TasteTrios dataset is a dataset that bring us in the world of perfect ingredient combinations, focusing on main ingredients harmonizing with two sub-ingredients. With 607 rows and 4 columns, this rich dataset categorizes flavor trios into three distinct compatibility levels: Highly Compatible, Moderately Compatible, and Compatible.

Dataset Structure:

The tabular structure of the dataset encompasses the following key attributes:.

Main Ingredient (Ingredient.1): The star of the show, serving as the foundation for the flavor trio.
Sub-Ingredient 1 (Ingredient.2): The first supporting ingredient, contributing to the delightful synergy of flavors.
Sub-Ingredient 2 (Ingredient.3): The second complementary ingredient, enhancing the overall taste experience.
Compatibility Level (Classification.Output): A classification indicating whether the trio is Highly Compatible, Moderately Compatible, or Compatible.

Compatibility Levels:

Highly Compatible: Unveils combinations celebrated for their exceptional synergy, creating culinary masterpieces that resonate widely.
Moderately Compatible: Highlights combinations with a solid balance of flavors, offering versatility and appeal to various tastes.
Compatible: Showcases combinations that work harmoniously, delivering a delightful taste experience suitable for diverse culinary applications.

Dataset Statistics:

Number of Rows: 607 Number of Columns: 4

PCA Analysis:

Loading Libraries and Reading the Dataset. The analysis commences by loading essential R libraries for data manipulation and visualization, including dplyr, ggplot2, and tidyr.

# Install 'dplyr' package
install.packages("dplyr")

# Install 'ggplot2' package
install.packages("ggplot2")

# Install 'tidyr' package
install.packages("tidyr")

# Load necessary libraries
library(dplyr)
library(ggplot2)
library(tidyr)

The TasteTrios dataset is then read into a data frame named taste_data.

Selecting Relevant Columns and Encoding Categorical Variables.

I select the columns relevant for PCA – “Ingredient.1”, “Ingredient.2”, and “Ingredient.3.” Categorical variables (ingredients) are encoded into numerical values using one-hot encoding.

# Select relevant columns for PCA
ingredients <- taste_data[, c("Ingredient.1", "Ingredient.2", "Ingredient.3")]

# Encode the categorical variables (ingredients) into numerical values
encodedIngredients <- model.matrix(~ . - 1, data = ingredients)

Performing PCA.

Principal Component Analysis (PCA) is performed on the encoded ingredients, scaling the variables to have unit variance.

# Perform PCA
pcaResult <- prcomp(encodedIngredients, scale. = TRUE)

Extracting Loadings and Creating a Data Frame.

Loadings, representing the contribution of each original ingredient to PC1 and PC2, are extracted from the PCA results. A data frame named loadingData is created to display ingredient names along with their loadings.

# Extract loadings
loadings <- pcaResult$rotation[, 1:2]

# Create a data frame with ingredient names and loadings
loadingData <- data.frame(Ingredient = colnames(encodedIngredients), PC1_Loadings = loadings[, 1], PC2_Loadings = loadings[, 2])

Displaying Top Contributing Ingredients for PC1 and PC2.

The top contributing ingredients for PC1 and PC2 are selected from the loadingData data frame, sorted based on the absolute values of loadings.

# Display the top contributing ingredients for PC1 and PC2
topContributorsPC1 <- head(loadingData[order(abs(loadingData$PC1_Loadings), decreasing = TRUE), ], 5)
topContributorsPC2 <- head(loadingData[order(abs(loadingData$PC2_Loadings), decreasing = TRUE), ], 5)

print("Top Contributors for PC1:")
print(topContributorsPC1)

print("Top Contributors for PC2:")
print(topContributorsPC2)

Top Contributors for PC1:
	PC1_Loadings	PC2_Loadings
Ingredient.1Walnuts	0.3053534	-0.0205575
Ingredient.1Potato	-0.1860897	0.1723672
Ingredient.1Mushroom	-0.1789891	0.0153631
Ingredient.1Blueberries	0.1733859	0.0052043
Ingredient.1Banana	0.1684649	0.0317120

Top Contributors for PC2:
	PC1_Loadings	PC2_Loadings
Ingredient.1Pork	-0.0749252	0.3084544
Ingredient.1Tomato	-0.1223276	-0.2675961
Ingredient.1Beef	-0.1584708	-0.2641685
Ingredient.2Paprika	-0.0538129	0.1746456
Ingredient.3Garlic Powder	-0.0538129	0.1746456

Creating Tables for Top Contributors of PC1 and PC2:

Top Contributors for PC1:. I create a data frame named tablePC1 to store the top contributing ingredients for Principal Component 1 (PC1). This table includes columns for the ingredient names and their corresponding loadings on PC1.

# Create a table for top contributors of PC1
tablePC1 <- data.frame(
  Ingredient = topContributorsPC1$Ingredient,
  PC1_Loadings = topContributorsPC1$PC1_Loadings
)

print("Top Contributors for PC1:")
print(tablePC1)

Top Contributors for PC1:
Ingredient	PC1_Loadings
Ingredient.1Walnuts	0.3053534
Ingredient.1Potato	-0.1860897
Ingredient.1Mushroom	-0.1789891
Ingredient.1Blueberries	0.1733859
Ingredient.1Banana	0.1684649

Top Contributors for PC2:
A similar table, tablePC2, is created to store the top contributing ingredients for Principal Component 2 (PC2). This table also includes columns for the ingredient names and their corresponding loadings on PC2.

# Create a table for top contributors of PC2
tablePC2 <- data.frame(
  Ingredient = topContributorsPC2$Ingredient,
  PC2_Loadings = topContributorsPC2$PC2_Loadings
)

print("Top Contributors for PC2:")
print(tablePC2)

Top Contributors for PC2:
Ingredient	PC2_Loadings
Ingredient.1Pork	0.3084544
Ingredient.1Tomato	-0.2675961
Ingredient.1Beef	-0.2641685
Ingredient.2Paprika	0.1746456
Ingredient.3Garlic Powder	0.1746456

Extracting Scores and Modifying the Original Dataset:

Scores along PC1 and PC2 are extracted from the PCA results and added to the original dataset taste_data. These scores represent the positioning of each dish in the PCA space.

# Extract scores for each dish along PC1 and PC2
scores <- pcaResult$x

# Add scores to the original dataset
taste_data$PC1_Score <- scores[, 1]
taste_data$PC2_Score <- scores[, 2]

Adding Highly Compatible Classification:

A new column named HighlyCompatible is added to the dataset, indicating whether each dish is classified as “Highly Compatible.” This classification is based on the comparison of the Classification.Output column with the value Highly Compatible.

# Add a column indicating the Highly Compatible classification
taste_data$HighlyCompatible <- (taste_data$Classification.Output == "Highly Compatible")

Identifying Ingredient Pairs for Highly Compatible Dishes:

For dishes labeled as “Highly Compatible,” the analysis proceeds to identify pairs of ingredients. This is achieved using the combn function, which forms pairs of ingredients and concatenates them with a ” + ” separator.

# Identify ingredient pairs for rows labeled as "Highly Compatible"
highlyCompatiblePairs <- apply(ingredients[taste_data$HighlyCompatible, ], 1, function(x) combn(x, 2, paste, collapse = " + "))

taste_data added High Compatible column first 25 rows
Ingredient.1	Ingredient.2	Ingredient.3	Classification.Output	PC1_Score	PC2_Score	HighlyCompatible
Pumpkin	Allspice	Bay Leaf	Highly Compatible	-0.8910157	0.3736151	TRUE
Pumpkin	Cinnamon	Ginger	Highly Compatible	-0.6678456	0.6863480	TRUE
Pumpkin	Pasta	Butter	Moderately Compatible	-2.0165888	0.8972191	FALSE
Pumpkin	Apples	Curry	Moderately Compatible	-1.2741037	0.7832284	FALSE
Pumpkin	Brown Sugar	Pine Nuts	Highly Compatible	-1.2241649	0.9760878	TRUE
Pumpkin	Garlic	Butter	Highly Compatible	-2.0392980	0.6153983	TRUE
Pumpkin	Chile Peppers	Garlic	Moderately Compatible	-1.6180660	1.1172530	FALSE
Pumpkin	Cream Cheese	Orange	Compatible	-1.6702822	0.0616271	FALSE
Pumpkin	Pumpkin Seeds	Cream Cheese	Highly Compatible	-0.9001725	-1.0781422	TRUE
Pumpkin	Honey	Balsamic Vinegar	Highly Compatible	-1.5442430	1.7367243	TRUE
Pumpkin	Olive Oil	Rosemary	Highly Compatible	-2.1103365	-0.0111933	TRUE
Mushroom	Thyme	Garlic	Highly Compatible	-1.8820949	1.5216851	TRUE
Mushroom	Butter	Sage	Highly Compatible	-1.5691672	2.2084197	TRUE
Mushroom	Onion	Rosemary	Moderately Compatible	-2.7713834	-0.3594805	FALSE
Mushroom	Cream	Parmesan Cheese	Highly Compatible	-2.6860904	-1.2398302	TRUE
Mushroom	Red Wine	Shallots	Highly Compatible	-2.7924824	-2.2059502	TRUE
Mushroom	Spinach	Feta Cheese	Moderately Compatible	-0.6780891	-0.1181095	FALSE
Mushroom	Balsamic Vinegar	Cherry Tomatoes	Compatible	-1.1813356	-0.1271337	FALSE
Mushroom	Soy Sauce	Ginger	Highly Compatible	-2.3436873	0.8851024	TRUE
Mushroom	Lemon	Dill	Highly Compatible	-1.3630086	0.1256193	TRUE
Mushroom	Bacon	Cheddar Cheese	Moderately Compatible	-1.6729146	0.1379083	FALSE
Mushroom	Olive Oil	Thyme	Moderately Compatible	-2.1951774	-0.2783752	FALSE
Mushroom	Cream Cheese	Chives	Highly Compatible	-2.0705426	0.1164483	TRUE
Mushroom	Garlic	Lemon Zest	Highly Compatible	-2.4689787	-0.4841222	TRUE
Mushroom	White Wine	Tarragon	Moderately Compatible	-2.9128185	-0.3381520	FALSE

Counting and Displaying Ingredient Pair Frequencies:

The resulting matrix of ingredient pairs is flattened into a single vector highlyCompatiblePairs. The occurrences of each unique ingredient pair are then counted using the table function, and a data frame sortedPairsTable is created to display the pairs sorted by frequency in descending order.

# Flatten the pairs matrix
highlyCompatiblePairs <- unlist(highlyCompatiblePairs)

# Count occurrences of ingredient pairs
pairOccurrences <- table(highlyCompatiblePairs)

# Display the table sorted by frequency
sortedPairsTable <- data.frame(IngredientPair = names(pairOccurrences), Frequency = as.numeric(pairOccurrences))
sortedPairsTable <- sortedPairsTable[order(sortedPairsTable$Frequency, decreasing = TRUE), ]

print(sortedPairsTable)

Combinaton of Ingredient
	IngredientPair	Frequency
549	Tomato + Basil	7
243	Garlic + Butter	5
16	Avocado + Cilantro	4
26	Avocado + Lime	4
142	Chicken + Lemon	4
174	Cilantro + Lime	4
207	Cucumber + Red Onion	4
341	Mushroom + Thyme	4
518	Shrimp + Garlic	4
519	Shrimp + Lemon	4
526	Smoked Salmon + Cream Cheese	4
585	Tomato + Red Onion	4
614	Walnuts + Goat Cheese	4
59	Banana + Greek Yogurt	3
70	Basil + Tomato	3
128	Cheddar Cheese + Bacon	3
162	Chickpeas + Lemon	3
234	Eggs + Spinach	3
256	Greek Yogurt + Honey	3
271	Lemon + Capers	3
276	Lemon + Garlic	3
294	Mango + Chili Powder	3
311	Maple Syrup + Mustard	3
324	Mushroom + Garlic	3
439	Potato + Garlic	3

Conclusion:

In concluding this exploration of the TasteTrios dataset, it’s crucial to revisit the hypotheses that guided this analysis. Each hypothesis, grounded in the application of Principal Component Analysis (PCA) and the goal to uncover distinctive patterns, played a pivotal role in shaping the methodology and expectations.

Hypothesis 1: Unveiling Key Contributors with PCA

I hypothesized that applying Principal Component Analysis (PCA) to the dataset would reveal key contributing ingredients. Indeed, as I delved into PC1 and PC2, it became clear that ingredients like Walnuts, Potatoes, and Mushrooms emerged as significant influencers, validating my initial expectation. PCA has proven to be an effective tool for distilling essential information from the dataset.

Hypothesis 2: Distinct Patterns in PCA Space for Highly Compatible Dishes.

In proposing that dishes classified as “Highly Compatible” would exhibit distinct patterns in the PCA space, the results have affirmed this expectation. Visualizing highly compatible dishes along PC1 and PC2 has revealed clearly visible clustering and positioning, providing clear evidence of the effectiveness of the compatibility classification. This aligns perfectly with my second hypothesis, highlighting that taste harmony translates into unique spatial patterns within the PCA framework.

Hypothesis 3: Contribution of Specific Ingredient Pairs.

My third hypothesis centered around the contribution of specific ingredient pairs to the “Highly Compatible” classification. The frequency analysis of ingredient pairs in highly compatible dishes has not only confirmed but exceeded my expectations. Identifying recurrent pairs like Tomato + Basil, Garlic + Butter, and Avocado + Cilantro affirms that certain combinations play a prominent role in achieving the highly compatible classification. These insights now guide me in crafting recipes with a more nuanced understanding of essential pairings.

In reflection, the alignment of theoretical expectations with practical outcomes strengthens the robustness of my analysis. This journey into the intersection of hypotheses, data, and culinary arts encapsulates a holistic approach that enriches both my analytical and creative pursuits. As I move forward, with these insights, the kitchen transforms into a canvas where I can experiment, innovate, and elevate my culinary creations with a newfound understanding of taste combinations and also will be useful for the people in the area of cusines career or even cooking in normal day life.

Feature Importance of ingredient in Compatibility