Using Hierarchical Clustering for Market Segmentation

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

Notice:

This is still an early draft. Let me know if there are any errors or typos.

Keep in mind that no programmer can avoid errors. I strongly agree with this quote from “CodeAcademy” that “Errors in your code mean you’re trying to do something cool.”

https://news.codecademy.com/errors-in-code-think-differently/

Segmentation

Objective - Dividing the target market or customers on the basis of some significant features which could help a company sell more products in less marketing expenses.

A potentially interesting question might be are some products (or customers) more alike than the others.

Market segmentation

Market segmentation is a strategy that divides a broad target market of customers into smaller, more similar groups, and then designs a marketing strategy specifically for each group. Clustering is a common technique for market segmentation since it automatically finds similar groups given a data set.

Create a product which evokes the needs & wants in target market

Imagine that you are the Director of Customer Relationships at Apple, and you might be interested in understanding consumers’ attitude towards iPhone 12 and Google’s Pixel 5. Once the product is created, the ball shifts to the marketing team’s court. As mentioned above, to understand which groups of customers will be interested in which kind of features, marketers will make use of market segmentation strategy. The cluster analysis algorithm is designed to address this problem. Doing this ensures the product is positioned to the right segment of customers with a high propensity to buy.

Examples of Objectives

1.Identify the type of customers who would respond to a particular offer

2.Identify high spenders among customers who will use the e-commerce channel for festive shopping

3.Identify customers who will default on their credit obligation for a loan or credit card

Example

The file customer_segmetation.csv contains data collected by my students in spring 2020.

Importing data - No need to download R or R studio

Search for Rstudio Cloud, register (or set up a free user account), and log into the cloud environment with your Gmail credentials.

You will upload your dataset (.csv) from your own computer to R Studio Cloud first. Make sure the first column is id instead of a variable.

Once the dataset is uploaded, you will see the dataset available on the right pane of your cloud environment.

Now we will be using the package (readr) and the function read_csv to read the dataset.

library(readr)
mydata <-read_csv('customer_segmentation.csv')

## Rows: 22 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (15): ID, CS_helpful, Recommend, Come_again, All_Products, Profesionalis...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Importing data

In the following step, you will standardize your data(i.e., data with a mean of 0 and a standard deviation of 1). You can use the scale function from the R environment which is a generic function whose default method centers and/or scales the columns of a numeric matrix.

Building distance function and ploting the trees (dendrograms)

Hierarchical clustering (using the function hclust) is an informative way to visualize the data.

We will see if we could discover subgroups among the variables or among the observations.

use = scale(mydata[,-c(1)], center = TRUE, scale = TRUE)
dist = dist(use)  
d <- dist(as.matrix(dist))   # find distance matrix 
seg.hclust <- hclust(d)                # apply hirarchical clustering 
library(ggplot2) # needs no introduction
plot(seg.hclust)

Identifying clustering memberships for each cluster

Imagine if your goal is to find some profitable customers to target. Now you will be able to see the number of customers using this algorithm.

groups.3 = cutree(seg.hclust,3)
table(groups.3)  #A good first step is to use the table function to see how # many observations are in each cluster

## groups.3
##  1  2  3 
## 17  2  3

#In the following step, we will find the members in each cluster or group.
mydata$ID[groups.3 == 1]

##  [1]  1  2  3  6  7  8  9 10 11 12 13 14 15 16 17 18 21

mydata$ID[groups.3 == 2]

## [1]  4 22

mydata$ID[groups.3 == 3]

## [1]  5 19 20

Identifying common features of each cluster using the aggregate function

#?aggregate
aggregate(mydata,list(groups.3),median)

##   Group.1 ID CS_helpful Recommend Come_again All_Products Profesionalism
## 1       1 11          1       1.0        1.0            2            1.0
## 2       2 13          3       2.5        1.5            3            1.5
## 3       3 19          2       1.0        3.0            3            2.0
##   Limitation Online_grocery delivery Pick_up Find_items other_shops Gender Age
## 1          1              2        2     3.0          1         2.0      1 2.0
## 2          2              3        3     2.5          2         1.5      1 2.5
## 3          1              2        3     1.0          2         3.0      2 2.0
##   Education
## 1         2
## 2         5
## 3         2

aggregate(mydata,list(groups.3),mean)

##   Group.1       ID CS_helpful Recommend Come_again All_Products Profesionalism
## 1       1 10.76471   1.294118  1.117647   1.235294     1.823529       1.235294
## 2       2 13.00000   3.000000  2.500000   1.500000     3.000000       1.500000
## 3       3 14.66667   2.333333  1.666667   2.666667     3.000000       2.333333
##   Limitation Online_grocery delivery  Pick_up Find_items other_shops   Gender
## 1   1.352941       2.235294 2.235294 2.705882   1.294118    2.647059 1.176471
## 2   2.000000       3.000000 3.000000 2.500000   2.000000    1.500000 1.000000
## 3   2.000000       2.000000 3.000000 1.000000   2.000000    3.000000 2.000000
##        Age Education
## 1 2.411765  3.117647
## 2 2.500000  5.000000
## 3 2.666667  2.333333

aggregate(mydata[,-1],list(groups.3),median)

##   Group.1 CS_helpful Recommend Come_again All_Products Profesionalism
## 1       1          1       1.0        1.0            2            1.0
## 2       2          3       2.5        1.5            3            1.5
## 3       3          2       1.0        3.0            3            2.0
##   Limitation Online_grocery delivery Pick_up Find_items other_shops Gender Age
## 1          1              2        2     3.0          1         2.0      1 2.0
## 2          2              3        3     2.5          2         1.5      1 2.5
## 3          1              2        3     1.0          2         3.0      2 2.0
##   Education
## 1         2
## 2         5
## 3         2

aggregate(mydata[,-1],list(groups.3),mean)

##   Group.1 CS_helpful Recommend Come_again All_Products Profesionalism
## 1       1   1.294118  1.117647   1.235294     1.823529       1.235294
## 2       2   3.000000  2.500000   1.500000     3.000000       1.500000
## 3       3   2.333333  1.666667   2.666667     3.000000       2.333333
##   Limitation Online_grocery delivery  Pick_up Find_items other_shops   Gender
## 1   1.352941       2.235294 2.235294 2.705882   1.294118    2.647059 1.176471
## 2   2.000000       3.000000 3.000000 2.500000   2.000000    1.500000 1.000000
## 3   2.000000       2.000000 3.000000 1.000000   2.000000    3.000000 2.000000
##        Age Education
## 1 2.411765  3.117647
## 2 2.500000  5.000000
## 3 2.666667  2.333333

cluster_means <- aggregate(mydata[,-1],list(groups.3),mean)

Exporting cluster analysis results into excel from R Studio Cloud

write.csv(groups.3, "clusterID.csv")
write.csv(cluster_means, "cluster_means.csv")

Downloading your solutions mannually

First, select the files (“clusterID.csv” & “cluster_means.csv”) and put a checkmark before each file.

Second, click the gear icon on the right side of your pane and export the data.

Finding means or medians of each variable (factor) for each cluster

Imagine if your goal is to find some profitable customers to target. Now using the mean function or the median function, you will be able to see the characteristics of each sub-group. Now it is time to use your domain expertise.

Discussion Questions for you

How many observations do we have in each cluster?

Answer: Based on the output from the clustering analysis, the data is divided into three clusters with uneven sizes. Cluster 1 contains 17 observations, making it the largest and likely representing the dominant customer segment. Cluster 2 has only 2 observations, and Cluster 3 and 3 observations, indicating much smaller, more niche segments. This imbalance suggests that most customers share similar characteristics, while a few behave quite differently. Recognizing cluster sizes is important because larger clusters typically represent more actionable and scalable market segments, whereas smaller clusters may represent specialized or unique customer groups. This distribution also highlights how customer behavior is not evenly spread, reinforcing the need for tailored strategies for each segment.

We can look at the medians (or means) for the variables in each cluster. Why is this important?

Answer: Examining the means or medians for each variable within a cluster is essential because it helps us interpret what cluster actually represents. Clustering alone only groups similar observations together, but it does not explain why they are similar. By looking at summary statistics, we can identify patterns in behaviors, preferences, or attitudes within each group. For example, one cluster may show higher engagement or stronger likelihood to recommend a service. This information allows us to translate clusters into meaningful customer profiles, which can then be used to design targeted marketing strategies and improve decision-making. It also helps validate whether the clusters are truly distinct and meaningful rather than random groupings.

Do you think if mean or median should be used when it comes to analyzing the differences among different clusters? Why?

Answer: In this case, the median is generally more appropriate than the mean for analyzing differences among clusters. This is because the datasets appears to include ordinal survey data (such as Likert scale responses), where values represent rankings rather than precise numerical distances. The median is less sensitive to outliers and skewed distributions, making it a more reliable measure of central tendency for this type of data. In addition, some clusters have very small sample sizes, which can cause the mean to be heavily influence by extreme values, Therefore, the median provides a more stable and accurate representation of each cluster’s typical behavior

Now we need to understand the common characteristics of each cluster. Our goal is to build targeting strategy using the profiles of each cluster. What summary measures of each cluster are appropriate in a descriptive sense.

Answer: To effectively describe each cluster, several summary measures should be used together. The most important are the mean or median values for each variable, as these provide insight into the central tendencies of behaviors or preferences within the cluster. Cluster size (number of observations) is also critical, as it indicates the relative importance of each segment. In addition, measures such as standard deviation can help assess variability within clusters, showing how consistent or diverse the group is. Together, these statistics allow us to build a clear and meaningful profile of each cluster, which is essential for developing targeted marketing and segmentation strategies.

Any major differences between K-means clustering (https://rpubs.com/utjimmyx/kmeans) and Hierarchical clustering? Which one do you like better? Why? You may refer to the assigned readings.

Answer: K-means and hierarchical clustering differ in both methodology and application. K-means requires the number of clusters to be specified in advance and works by iteratively assigning observations to cluster centers, making it efficient for large datasets. In contrast, hierarchical clustering does not require pre-specifying the number of clusters and instead builds a tree-like structure (dendrogram) that shows how observations are grouped step by step. I prefer hierarchical clustering for this analysis because it provides a visual presentation of the clustering process and is better suited for smaller datasets. It also allows for more flexibility in determining the optional number of clusters after examining the data.

Advanced Questions (optional but highly recommended)

O. The aggregate function is well suited for this task. Should we use mydata or mydata[,-1] along with the aggregate function? Why? Hint: see the results on my tutorial.

Answer: We should use mydata[,-1] when applying the aggregate function because the first column typically represents an ID variable, which is not meaningful for analysis. Including the ID in calculations would distort the results, as it does not reflect any behavioral or numerical characteristic relevant to clustering. By removing the first column, we ensure that the aggregation focuses only on the actual variables of interest, such as customer attitudes or behaviors. This leads to more accurate and interpretable summary statistics for each cluster, which is essential for understanding and comparing the characteristics of different segments. In addition, excluding irrelevant variables like IDs improves the overall quality of the analysis and prevents misleading conclusions.

References

Cluster analysis - reading (p.385-p.399) https://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf

Comparison of similarity coefficients used for cluster analysis with dominant markers in maize (Zea mays L) https://www.scielo.br/scielo.php?script=sci_arttext&pid=S1415-47572004000100014&lng=en&nrm=iso

Principal Component Methods in R: Practical Guide http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/118-principal-component-analysis-in-r-prcomp-vs-princomp/