Segmenting Consumers Based on Energy Drink Preference”
Author
Chaitanya Kumar
Introduction
In today’s competitive beverage market, understanding customer preferences is crucial for successful product positioning and marketing. This report focuses on segmenting customers based on their ratings of five energy drink versions (D1, D2, D3, D4, and D5), which vary in the concentration of a flavoring ingredient. Customer demographic information such as age and gender is also analyzed to provide deeper insights. Through clustering analysis, we aim to identify distinct customer groups and recommend targeted marketing strategies to optimize product appeal and profitability.
You can add options to executable code like this
### 1. Import the Dataset# Load the datasetenergy_data <-read_csv("energy_drinks.csv")# Display the structure of the datastr(energy_data)
The echo: false option disables the printing of code (only output is displayed).
### 2. Create a Distance Matrix# Select only numerical columns for clusteringnumerical_data <- energy_data %>%select(D1:D5)# Scale the datascaled_data <-scale(numerical_data)# Create a Euclidean distance matrixdistance_matrix <-dist(scaled_data, method ="euclidean")
a. Does the data need to be scaled before computing the distance matrix?
Scaling is necessary to ensure all variables contribute equally to the distance computation. Without scaling, variables with larger ranges dominate, leading to biased results. Since the variables represent ratings on the same scale, scaling minimizes bias and enhances clustering accuracy, ensuring fair representation of all features in the distance calculation.
::: {.cell}
```{.r .cell-code}
### 3. Perform Hierarchical Clustering
# Perform hierarchical clustering using the "average" method
hclust_result <- hclust(distance_matrix, method = "average")
# Visualize the dendrogram
plot(hclust_result, main = "Dendrogram of Hierarchical Clustering",
xlab = "", ylab = "Height")
::: Explanation: The dendrogram provides a visual representation of the hierarchical clustering process. The height of the branches indicates the dissimilarity between clusters, helping us decide the optimal number of clusters for the dataset.
Dendrogram
### 4. Visualize the Clustering Results# Create a colored dendrogramdendrogram <-as.dendrogram(hclust_result)colored_dend <-color_branches(dendrogram, k =3)plot(colored_dend, main ="Colored Dendrogram for 3 Clusters")
Heatmap
# Visualize data with a heatmappheatmap(scaled_data, cluster_rows =TRUE, cluster_cols =FALSE,show_rownames =FALSE, main ="Heatmap of Energy Drink Ratings")
Does the heatmap provide evidence of any clustering structure within the energy drinks dataset?
Answer: The heatmap shows distinct blocks of similarity, indicating that certain groups of participants share similar preferences for the energy drinks. This validates the presence of clustering patterns and supports the segmentation approach.
### 5. Create a 3-Cluster Solution# Cut the dendrogram into 3 clustersclusters <-cutree(hclust_result, k =3)# Add cluster labels to the original dataenergy_data$Cluster <-as.factor(clusters)# Assess the quality of clustering using silhouette scoressilhouette_scores <-silhouette(clusters, distance_matrix)mean_silhouette <-mean(silhouette_scores[, "sil_width"])mean_silhouette
[1] 0.215618
Explanation: The silhouette score evaluates clustering quality. A higher mean silhouette score signifies better-defined and more cohesive clusters, reflecting the effectiveness of the clustering method.
### 6. Profile the Clusters#### a. How do the clusters differ on their average rating of each version of the energy drinks?# Compute average ratings by clustercluster_means <- energy_data %>%group_by(Cluster) %>%summarise(across(D1:D5, mean))# Format the tablekable(cluster_means, col.names =c("Cluster", "D1", "D2", "D3", "D4", "D5")) %>%kable_styling(full_width =FALSE)
Cluster
D1
D2
D3
D4
D5
1
2.945578
4.811791
6.274376
6.646259
6.603175
2
2.508982
4.610778
7.323353
5.089820
2.718563
3
6.642241
5.081897
3.439655
3.159483
2.956897
# Melt data for plottingmelted_cluster_means <-melt(cluster_means, id.vars ="Cluster")# Line graph for cluster ratingsggplot(melted_cluster_means, aes(x = variable, y = value, color = Cluster, group = Cluster)) +geom_line(size =1.2) +geom_point(size =3) +labs(title ="Average Ratings of Energy Drink Versions by Cluster", x ="Energy Drink Version", y ="Average Rating") +theme_minimal()
Explanation: The graph and table highlight significant variations in ratings across clusters for the five energy drink versions. These insights allow us to identify the preferred versions for each segment, guiding targeted marketing efforts.
#### b. How do the clusters differ on age and gender?# Add Cluster column to energy_dataclusters <-cutree(hclust_result, k =3) # Replace with your number of clustersenergy_data <- energy_data %>%mutate(Cluster =as.factor(clusters))
# Age distribution by clusterage_distribution <- energy_data %>%group_by(Cluster, Age) %>%summarise(Count =n(), .groups ="drop") %>%group_by(Cluster) %>%# Calculate proportions within each clustermutate(Proportion = Count /sum(Count))# Line graph for age distributionggplot(age_distribution, aes(x = Age, y = Proportion, color = Cluster, group = Cluster)) +geom_line(size =1.2) +geom_point(size =3) +labs(title ="Age Distribution by Cluster", x ="Age Group", y ="Proportion" ) +theme_minimal()
# Gender distribution by clustergender_distribution <- energy_data %>%group_by(Cluster, Gender) %>%summarise(Count =n(), .groups ="drop") %>%group_by(Cluster) %>%# Calculate proportions within each clustermutate(Proportion = Count /sum(Count))# Line graph for gender distributionggplot(gender_distribution, aes(x = Gender, y = Proportion, color = Cluster, group = Cluster)) +geom_line(size =1.2) +geom_point(size =3) +labs(title ="Gender Distribution by Cluster", x ="Gender", y ="Proportion" ) +theme_minimal()
Heatmap Explanation of Heatmap: Distinct Patterns: The heatmap visually demonstrates the clustering structure in the dataset. Rows (participants) are grouped into clusters based on their similarity in ratings for the five energy drink versions (D1–D5).
Blocks of Similarity: The heatmap shows distinct blocks where ratings are similar within each cluster, validating the segmentation approach.
Intensity of Color: Darker shades indicate higher ratings, while lighter shades represent lower ratings.
Clusters with consistent darker blocks for specific drinks highlight a shared preference.
Insights: Cluster 1 shows stronger ratings for D1 and relatively weaker ratings for D5. Cluster 2 has a balanced pattern, with consistently moderate ratings for all drinks but peaks for D3. Cluster 3 prefers D5 while rating D1 lower, indicating an inclination toward bold flavors ******************************************************************************
Graphs Age Distribution by Cluster:
Cluster 1: Higher proportion of older participants (e.g., 36–45 age group). This group is more likely to prefer subtle flavors (D1).
Cluster 2: A balanced distribution of participants across all age groups. This diversity explains their preference for the balanced flavor of D3.
Cluster 3: Predominantly younger participants (e.g., 18–25 age group). They show a stronger preference for bold and intense flavors (D5).
Gender Distribution by Cluster: Cluster 1: Gender proportions are nearly equal, showing that D1 has broad appeal across genders.
Cluster 2: A slight female dominance is observed, suggesting that marketing for D3 can be adjusted to appeal more to women.
Cluster 3: Predominantly male, aligning with the adventurous and bold positioning of D5. ************************************************************************************
Clusters
Cluster 1: Preferences: Strong preference for D1, with lower ratings for D5.
Demographics: Likely older participants, split evenly between genders. Flavor Preference: Subtle and mild. Behavior: Traditional, seeking a more balanced and less intense flavor experience.
Cluster 2: Preferences: Strong preference for D3, with moderate ratings across all drinks. Demographics: Mixed age and gender group, leaning slightly female. Flavor Preference: Balanced and versatile. Behavior: Broad appeal, representing participants who enjoy moderate intensity.
Cluster 3: Preferences: Strong preference for D5, with lower ratings for D1. Demographics: Predominantly younger and male. Flavor Preference: Bold and intense. Behavior: Adventurous, drawn to strong and impactful flavors.
Highlight its mild and subtle flavor. Use traditional advertising channels like TV and newspapers to reach older demographics. Position it as a “classic choice” for balanced refreshment. D3 (Cluster 2):
Emphasize its versatility and universal appeal. Leverage digital platforms such as social media and influencer marketing to reach diverse demographics. Highlight testimonials showcasing its balanced flavor profile. D5 (Cluster 3):
Focus on its bold, intense flavor. Use dynamic campaigns like sports sponsorships or collaborations with high-energy events. Engage younger audiences via Instagram, TikTok, and action-packed visual storytelling.
Product Strategy: Primary Product: D3, given its broad appeal across demographics and clusters. Supporting Products: D1 for traditional consumers and D5 for younger, adventurous audiences. Product Bundles: Combine D1 and D3 or D3 and D5 in promotional bundles to encourage cross-cluster trials. These insights ensure marketing and product strategies align with each cluster’s unique preferences and behaviors, maximizing customer satisfaction and profitability