Cluster Analysis on Yogurt Beverages

This report will be on Cluster Analysis using the article “Ingredient Analysis on Consumer Insight of Yogurt Beverages.” Three terms that were pointed out in this article are Taste, Texture, and Branding. These terms will be analyzed with the consumer segments the article mentioned, Variety Seekers, Traditionalists, and Impressionables. The Variety Seekers are the ones that prefer beverages with diverse flavors and texture. Traditionalists are ones that prefer simple product descriptors and classic beverage types. The Impressionables are the ones attracted to detailed descriptions, emotional appeals and branding elements.

# Load required libraries
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(cluster)
library(factoextra)

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

# Step 1: Simulate consumer ratings on 'Taste' (scale: 1 to 10)
set.seed(123)

# Simulate 100 observations for each consumer segment
variety_seekers <- data.frame(taste = rnorm(100, mean = 9, sd = 0.8))
traditionalists <- data.frame(taste = rnorm(100, mean = 7, sd = 1))
impressionables <- data.frame(taste = rnorm(100, mean = 7.5, sd = 1))

# Combine into a single dataset
taste_data <- rbind(variety_seekers, traditionalists, impressionables)
rownames(taste_data) <- NULL

# Step 2: Scale the data (important for clustering)
scaled_taste <- scale(taste_data)

# Step 3: Use the Elbow method to determine optimal number of clusters
fviz_nbclust(scaled_taste, kmeans, method = "wss") +
  labs(title = "Elbow Method for Optimal Clusters (Taste)")

# Step 4: K-means clustering (try with 3 clusters)
set.seed(123)
kmeans_taste <- kmeans(scaled_taste, centers = 3, nstart = 25)

# Step 5: Add cluster labels to the dataset
taste_clustered <- taste_data %>%
  mutate(cluster = as.factor(kmeans_taste$cluster))

# Step 6: Visualize clusters using a 1D plot (since we only have 'taste')
ggplot(taste_clustered, aes(x = taste, fill = cluster)) +
  geom_histogram(binwidth = 0.5, position = "dodge", color = "black") +
  labs(title = "Consumer Clusters Based on Taste",
       x = "Taste Rating",
       y = "Count") +
  theme_minimal()

# Optional: View head of clustered data
head(taste_clustered)

##       taste cluster
## 1  8.551619       2
## 2  8.815858       2
## 3 10.246967       2
## 4  9.056407       2
## 5  9.103430       2
## 6 10.372052       2

Cluster 1 the Variety Seekers have the highest taste rating compared to the others.

# Load necessary libraries
library(tidyverse)
library(cluster)
library(factoextra)

# Step 1: Simulate consumer ratings on 'Texture' (scale: 1 to 10)
set.seed(456)

# Simulate 100 consumers per segment based on article insights
variety_seekers <- data.frame(texture = rnorm(100, mean = 8.5, sd = 0.8))
traditionalists <- data.frame(texture = rnorm(100, mean = 7, sd = 1))
impressionables <- data.frame(texture = rnorm(100, mean = 7.5, sd = 1))

# Combine into one dataset
texture_data <- rbind(variety_seekers, traditionalists, impressionables)
rownames(texture_data) <- NULL

# Step 2: Scale the data
scaled_texture <- scale(texture_data)

# Step 3: Determine optimal number of clusters
fviz_nbclust(scaled_texture, kmeans, method = "wss") +
  labs(title = "Elbow Method for Optimal Clusters (Texture)")

# Step 4: Perform K-means clustering
set.seed(456)
kmeans_texture <- kmeans(scaled_texture, centers = 3, nstart = 25)

# Step 5: Add cluster assignments to the data
texture_clustered <- texture_data %>%
  mutate(cluster = as.factor(kmeans_texture$cluster))

# Step 6: Visualize the clusters using a histogram (1D data)
ggplot(texture_clustered, aes(x = texture, fill = cluster)) +
  geom_histogram(binwidth = 0.5, position = "dodge", color = "black") +
  labs(title = "Consumer Clusters Based on Texture Preference",
       x = "Texture Rating",
       y = "Number of Consumers") +
  theme_minimal()

# Optional: View clustered data
head(texture_clustered)

##    texture cluster
## 1 7.425183       3
## 2 8.997420       2
## 3 9.140700       2
## 4 7.388886       3
## 5 7.928515       3
## 6 8.240751       3

Cluster 3 the Impressionables have the highest rating for texture.

# Load required libraries
library(tidyverse)
library(cluster)
library(factoextra)

# Step 1: Simulate 'Branding' preference data (scale: 1 to 10)
set.seed(789)

# Simulated preferences per segment
variety_seekers <- data.frame(branding = rnorm(100, mean = 6, sd = 1))
traditionalists <- data.frame(branding = rnorm(100, mean = 3.5, sd = 1))
impressionables <- data.frame(branding = rnorm(100, mean = 8.5, sd = 0.8))

# Combine into one dataset
branding_data <- rbind(variety_seekers, traditionalists, impressionables)
rownames(branding_data) <- NULL

# Step 2: Scale the data
scaled_branding <- scale(branding_data)

# Step 3: Determine the optimal number of clusters
fviz_nbclust(scaled_branding, kmeans, method = "wss") +
  labs(title = "Elbow Method for Optimal Clusters (Branding)")

# Step 4: Apply K-means clustering
set.seed(789)
kmeans_branding <- kmeans(scaled_branding, centers = 3, nstart = 25)

# Step 5: Add cluster assignments to the data
branding_clustered <- branding_data %>%
  mutate(cluster = as.factor(kmeans_branding$cluster))

# Step 6: Visualize the clustering results
ggplot(branding_clustered, aes(x = branding, fill = cluster)) +
  geom_histogram(binwidth = 0.5, position = "dodge", color = "black") +
  labs(title = "Consumer Clusters Based on Branding Preference")

The Impressionable in Cluster 3 also have the highest count in Branding. The Traditionalists and Variety Seekers also have a high count in branding preferences.

Cluster Analysis on Yogurt Beverages

Ysabel Gamon

2025-04-09