Market segmentation is a crucial aspect of marketing
analytics as it allows us to understand the diverse customer groups in a
market and tailor marketing strategies accordingly. In this analysis, we
will perform a cluster analysis on the dataset
Kirin_Segmentation.csv to identify distinct market
segments. This report will detail the steps taken in R to achieve our
segmentation, including preparing the dataset, performing hierarchical
clustering, and profiling the resulting clusters.
Firstly, we need to ensure that all the required packages are installed and loaded. The “cluster” package is essential as it contains methods for cluster analysis. We will clear the R environment to ensure a clean workspace and set our working directory to the location where our dataset is stored. Subsequently, we will load the segmentation data, select the relevant variables for the analysis, and normalize the data.
Normalization is key in preparing the data for clustering, as it ensures each attribute contributes equally without bias due to scale differences. After normalizing the data, we compute a distance matrix which serves as the basis for our hierarchical clustering. Using complete linkage, we perform hierarchical clustering and determine the membership of each data point in a cluster.
Finally, we will characterize the clusters by calculating the mean of each attribute within each cluster, thus profiling them. The output of this process will be used to answer subsequent questions about the number of distinct market segments, their detailed profiles, and which segments to target for marketing campaigns.
Below is the R code that initiates this segmentation analysis:
# Installation and loading of the "cluster" package
# install.packages("cluster") # Uncomment this line if "cluster" is not installed
library(cluster)
## Warning: package 'cluster' was built under R version 4.3.3
# Clearing the R environment
rm(list = ls(all = TRUE))
# Setting the working directory to where the dataset is located
setwd("C:/Users/gambe/Downloads")
# Loading the segmentation data
segdata <- read.csv("Kirin_Segmentation.csv")
# Exploring the dataset structure, viewing the first few rows and the variable names
str(segdata)
## 'data.frame': 317 obs. of 34 variables:
## $ id : int 6861 4129 4393 445 7393 964 6773 461 7156 5785 ...
## $ Rich.full.bodied : int 0 0 0 8 9 0 8 5 9 8 ...
## $ Light.beer : int 1 1 4 7 7 8 3 8 2 3 ...
## $ No.aftertaste : int 2 7 4 0 7 7 5 8 4 6 ...
## $ Refreshing : int 3 0 8 0 9 0 7 2 6 5 ...
## $ Goes.down.easily : int 6 0 7 0 7 0 5 8 5 5 ...
## $ Gives.a..buzz. : int 5 1 5 2 8 4 8 1 6 7 ...
## $ Good.taste : int 0 0 9 0 9 0 7 8 9 9 ...
## $ Low.price : int 1 1 2 3 5 0 3 5 6 8 ...
## $ Good.value : int 8 1 6 8 0 0 2 2 7 8 ...
## $ From.country.with.brewing.tradition: int 3 1 3 3 8 2 7 2 5 8 ...
## $ Attractive.bottle : int 2 1 4 5 6 3 6 1 2 6 ...
## $ Prestigious.brand : int 2 1 3 2 5 2 4 2 4 3 ...
## $ High.quality : int 0 9 0 2 8 8 8 0 8 9 ...
## $ Drink.at.picnics : int 8 8 6 5 6 8 6 5 5 5 ...
## $ Masculine : int 1 1 5 1 1 0 5 1 3 5 ...
## $ For.young.people : int 1 1 4 1 8 1 3 2 2 5 ...
## $ Drink.with.friends : int 8 7 7 2 6 8 6 5 8 7 ...
## $ Drink.at.home : int 0 7 0 2 8 9 5 9 8 8 ...
## $ To.serve.dinner.guests : int 7 6 6 9 6 2 7 8 6 6 ...
## $ For.dining.out : int 9 6 8 7 5 2 8 5 5 7 ...
## $ Drink.at.bar : int 7 8 7 1 8 2 5 5 4 7 ...
## $ Weekly.consumption : int 2 9 6 24 2 8 5 12 1 8 ...
## $ Age..1.7. : int 5 6 4 5 2 5 4 5 4 2 ...
## $ Income..1.7. : int 4 7 6 7 3 5 4 4 5 6 ...
## $ Education..1.6. : int 5 3 6 5 5 5 5 2 3 3 ...
## $ Sex..M.1. : int 1 1 1 2 2 2 1 1 1 1 ...
## $ Adapt.to.new.situations : int 4 4 4 3 3 4 3 3 4 3 ...
## $ Make.friends.easily : int 3 4 4 2 3 4 2 4 2 2 ...
## $ Don.t.like.to.be.tied.to.timetable : int 4 3 4 4 3 3 4 3 4 3 ...
## $ Like.to.take.chances : int 3 4 4 3 3 4 3 2 3 3 ...
## $ Like.to.travel.abroad : int 4 3 4 4 3 4 2 2 4 4 ...
## $ Like.ethnic.food : int 4 4 4 3 2 4 3 2 3 3 ...
## $ Knowledgeable.about.beer : int 3 3 4 4 3 3 4 2 2 4 ...
head(segdata)
## id Rich.full.bodied Light.beer No.aftertaste Refreshing Goes.down.easily
## 1 6861 0 1 2 3 6
## 2 4129 0 1 7 0 0
## 3 4393 0 4 4 8 7
## 4 445 8 7 0 0 0
## 5 7393 9 7 7 9 7
## 6 964 0 8 7 0 0
## Gives.a..buzz. Good.taste Low.price Good.value
## 1 5 0 1 8
## 2 1 0 1 1
## 3 5 9 2 6
## 4 2 0 3 8
## 5 8 9 5 0
## 6 4 0 0 0
## From.country.with.brewing.tradition Attractive.bottle Prestigious.brand
## 1 3 2 2
## 2 1 1 1
## 3 3 4 3
## 4 3 5 2
## 5 8 6 5
## 6 2 3 2
## High.quality Drink.at.picnics Masculine For.young.people Drink.with.friends
## 1 0 8 1 1 8
## 2 9 8 1 1 7
## 3 0 6 5 4 7
## 4 2 5 1 1 2
## 5 8 6 1 8 6
## 6 8 8 0 1 8
## Drink.at.home To.serve.dinner.guests For.dining.out Drink.at.bar
## 1 0 7 9 7
## 2 7 6 6 8
## 3 0 6 8 7
## 4 2 9 7 1
## 5 8 6 5 8
## 6 9 2 2 2
## Weekly.consumption Age..1.7. Income..1.7. Education..1.6. Sex..M.1.
## 1 2 5 4 5 1
## 2 9 6 7 3 1
## 3 6 4 6 6 1
## 4 24 5 7 5 2
## 5 2 2 3 5 2
## 6 8 5 5 5 2
## Adapt.to.new.situations Make.friends.easily
## 1 4 3
## 2 4 4
## 3 4 4
## 4 3 2
## 5 3 3
## 6 4 4
## Don.t.like.to.be.tied.to.timetable Like.to.take.chances Like.to.travel.abroad
## 1 4 3 4
## 2 3 4 3
## 3 4 4 4
## 4 4 3 4
## 5 3 3 3
## 6 3 4 4
## Like.ethnic.food Knowledgeable.about.beer
## 1 4 3
## 2 4 3
## 3 4 4
## 4 3 4
## 5 2 3
## 6 4 3
names(segdata)
## [1] "id" "Rich.full.bodied"
## [3] "Light.beer" "No.aftertaste"
## [5] "Refreshing" "Goes.down.easily"
## [7] "Gives.a..buzz." "Good.taste"
## [9] "Low.price" "Good.value"
## [11] "From.country.with.brewing.tradition" "Attractive.bottle"
## [13] "Prestigious.brand" "High.quality"
## [15] "Drink.at.picnics" "Masculine"
## [17] "For.young.people" "Drink.with.friends"
## [19] "Drink.at.home" "To.serve.dinner.guests"
## [21] "For.dining.out" "Drink.at.bar"
## [23] "Weekly.consumption" "Age..1.7."
## [25] "Income..1.7." "Education..1.6."
## [27] "Sex..M.1." "Adapt.to.new.situations"
## [29] "Make.friends.easily" "Don.t.like.to.be.tied.to.timetable"
## [31] "Like.to.take.chances" "Like.to.travel.abroad"
## [33] "Like.ethnic.food" "Knowledgeable.about.beer"
# Selecting columns for cluster analysis and removing the ID variable
segdata1 <- segdata[, 1:22]
z <- segdata1[, -1]
# Normalizing the data
means <- apply(z, 2, mean)
sds <- apply(z, 2, sd)
nor <- scale(z, center = means, scale = sds)
# Calculating the Euclidean distance matrix
distance <- dist(nor)
# Performing hierarchical clustering
segdata.hclust <- hclust(distance)
plot(segdata.hclust, hang = -1)
# Determining cluster membership for 5 clusters
member <- cutree(segdata.hclust, 5)
# Adding cluster membership to the dataset and saving it
segdata2 <- cbind(segdata, member)
write.csv(segdata2, 'Kirin_with_member.csv')
# Characterizing clusters by calculating the mean of each attribute per cluster
cluster_means <- aggregate(segdata1[,-1], by = list(member), mean)
We can determine the number of distinct market segments by analyzing the within-group sum of squares (WSS) for different numbers of clusters. The “elbow method” is used to identify the point at which the decrease in WSS slows down significantly, indicating the optimal number of clusters. Below is the R code used to generate the Scree Plot:
# Scree Plot
wss <- (nrow(nor)-1)*sum(apply(nor,2,var))
for (i in 2:20) wss[i] <- sum(kmeans(nor, centers=i)$withinss)
plot(1:20, wss, type="b", xlab="Number of Clusters", ylab="Within groups sum of squares")
Based on the Scree Plot, we look for the “elbow” where the plot begins to flatten out, which indicates that adding more clusters does not significantly improve the fit.
The plot shows that the decrease in WSS is sharp up to around 5 clusters, after which the decrease becomes more gradual. This suggests that the market can be segmented into 5 distinct segments. The choice of 5 is supported by the rationale that further increasing the number of segments would result in a lesser relative improvement in WSS, thus not adding significant value to the segmentation model
Hence, for the purposes of this segmentation analysis, we will select 5 as the number of distinct market segments.
After determining that there are five distinct segments in
the market, the next step is to profile each segment. Profiling involves
analyzing the mean values of each attribute within the clusters to
understand the characteristics that define each segment. The
aggregate() function in R can be used to calculate the
means for each cluster.
Here is the R code that profiles the clusters:
# Characterizing clusters by calculating means for each cluster
cluster_profiles <- aggregate(segdata1[,-1], by=list(cluster=member), mean)
# View the cluster profiles
print(cluster_profiles)
## cluster Rich.full.bodied Light.beer No.aftertaste Refreshing Goes.down.easily
## 1 1 5.184615 3.076923 4.223077 5.638462 5.476923
## 2 2 2.302632 3.131579 3.552632 2.197368 2.552632
## 3 3 7.424242 4.060606 5.954545 7.333333 7.242424
## 4 4 2.875000 6.458333 4.750000 4.208333 6.416667
## 5 5 4.904762 5.666667 5.714286 5.095238 4.761905
## Gives.a..buzz. Good.taste Low.price Good.value
## 1 3.000000 2.3923077 3.330769 4.323077
## 2 2.605263 0.4868421 3.302632 3.197368
## 3 4.681818 7.8939394 5.454545 6.727273
## 4 4.416667 1.5000000 5.500000 4.750000
## 5 3.476190 2.1904762 3.047619 5.285714
## From.country.with.brewing.tradition Attractive.bottle Prestigious.brand
## 1 3.515385 2.546154 2.607692
## 2 2.815789 2.460526 1.697368
## 3 4.757576 3.696970 4.742424
## 4 5.375000 2.875000 4.625000
## 5 4.571429 5.666667 5.809524
## High.quality Drink.at.picnics Masculine For.young.people Drink.with.friends
## 1 4.746154 4.800000 2.100000 1.700000 5.330769
## 2 1.210526 2.421053 1.921053 1.815789 2.236842
## 3 6.803030 5.727273 4.196970 3.924242 6.348485
## 4 7.666667 6.333333 2.708333 3.375000 6.166667
## 5 3.666667 5.190476 4.047619 4.285714 2.809524
## Drink.at.home To.serve.dinner.guests For.dining.out Drink.at.bar
## 1 4.538462 5.500000 5.423077 4.523077
## 2 1.973684 2.302632 2.236842 1.552632
## 3 6.424242 6.909091 6.893939 6.348485
## 4 5.541667 6.166667 6.375000 6.916667
## 5 3.714286 3.095238 5.333333 4.142857
Cluster 1: “Connoisseurs” These individuals appreciate full-bodied taste and quality in beer and seem less concerned about the price. They might be older or have more refined tastes in beer.
Cluster 2: “Social Drinkers” This cluster enjoys light beer that is refreshing and perfect for drinking with friends at social events. They value beer that doesn’t leave an aftertaste and can be consumed in a casual setting.
Cluster 3: “Traditionalists” Members of this cluster prefer beer from a country with brewing tradition, indicating a preference for heritage and authenticity. They may also value the beer’s ability to complement food at a meal.
Cluster 4: “Trendy Patrons” This group finds it important that the beer is attractive and served at barbecues and parties. They might be younger customers who care about the social prestige and image associated with the beer they drink.
Cluster 5: “Practical Consumers” Price-sensitive and looking for a good value, these consumers prefer a beer that’s suitable for drinking at home and doesn’t necessarily need to be a prestigious brand. They are likely to be driven by budget more than by brand or taste.
The naming of the clusters is based on the distinguishing features that are prominent within each group. These profiles will assist in tailoring marketing campaigns to each segment’s specific preferences and behaviors.
Based on the detailed profiling of each cluster, we can now develop a targeted marketing strategy. Our objective is to maximize the impact of our campaign by focusing on those segments that are most likely to respond to our marketing efforts and have the highest potential value to our brand. Here’s a strategic approach to targeting or avoiding each cluster:
Cluster 1: “Connoisseurs” Targeting Justification: These customers appreciate quality and have less price sensitivity. Our campaign will focus on highlighting the premium aspects of our beer, including its full-bodied taste and superior brewing process. This cluster likely includes affluent consumers who are willing to pay more for a premium experience. Marketing efforts could include sponsorship of high-end events and tastings at specialty stores.
Cluster 2: “Social Drinkers” Targeting Justification: This segment enjoys beer in social settings and prefers a light, refreshing taste without a lingering aftertaste. The marketing campaign will leverage social media platforms and event sponsorships, focusing on occasions where beer is enjoyed in a group, such as sports events, concerts, or beach parties. Promotional deals that encourage group purchases could also be effective.
Cluster 3: “Traditionalists” Targeting Justification: Traditionalists value the heritage and authenticity of beer. Our campaign will emphasize the historical roots of our beer and its association with traditional brewing techniques. This segment might be less open to new brands, so marketing efforts should be directed toward building trust through brand storytelling and community-based events.
Cluster 4: “Trendy Patrons” Targeting Justification: Trendy patrons care about the image and social prestige of the beer they drink. This group is likely more adventurous and open to trying new things, making them an ideal target for launching new flavors or limited-edition brews. Influencer partnerships and social media campaigns showcasing the beer in trendy settings would resonate well with this segment.
Cluster 5: “Practical Consumers” Avoiding Justification: While this segment is significant, they are driven by practicality and value. They may not be the best target for a premium marketing campaign as their purchase behavior is largely driven by price. It might not be cost-effective to pursue this group if our brand’s price point is above the segment’s average willingness to pay. Instead, we could periodically engage them with discounts or value packs to drive volume without diluting the premium perception of the brand.
Each cluster’s strategy is crafted based on its unique attributes and preferences, as revealed by the cluster analysis. By aligning our marketing tactics with these insights, we can ensure that our campaigns are well-received and effective in driving brand growth within each targeted segment.
In conclusion, we will primarily target Clusters 1, 2, and 4, as these groups align with our brand’s value proposition and are likely to respond positively to our marketing efforts. Cluster 3 will receive a more moderate focus, leveraging the brand’s tradition and heritage, while Cluster 5 will be engaged less aggressively due to their high price sensitivity.