1 Dataset Description

The data that I am going to use is wholesale.csv that refers to clients of a wholesale distributor. It includes the annual spending in monetary units on diverse product categories.

In this article, I will demonstrate to group the data using K-means clustering (unsupervised learning) to identify underlying patterns and structures in the data. K-means clustering will group customers based on their behavior on spending different product categories. Then I will use the classification method (supervised learning) to predict the customer preferences.

The aim is to analyze what strategies that might work on each type of customer to increase company’s sales.

2 Import Library

library(dplyr)
library(tidyr)
library(GGally)
library(gridExtra)
library(factoextra)
library(FactoMineR)
library(plotly)
library(class)
library(caret)
library(ggiraphExtra)

3 Read Dataset

sales <- read.csv("wholesale.csv")
head(sales)

4 Data Preprocessing

anyNA(sales)
## [1] FALSE

The data does not have any missing value.

glimpse(sales)
## Rows: 440
## Columns: 8
## $ Channel          <int> 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1,…
## $ Region           <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,…
## $ Fresh            <int> 12669, 7057, 6353, 13265, 22615, 9413, 12126, 7579, 5…
## $ Milk             <int> 9656, 9810, 8808, 1196, 5410, 8259, 3199, 4956, 3648,…
## $ Grocery          <int> 7561, 9568, 7684, 4221, 7198, 5126, 6975, 9426, 6192,…
## $ Frozen           <int> 214, 1762, 2405, 6404, 3915, 666, 480, 1669, 425, 115…
## $ Detergents_Paper <int> 2674, 3293, 3516, 507, 1777, 1795, 3140, 3321, 1716, …
## $ Delicassen       <int> 1338, 1776, 7844, 1788, 5185, 1451, 545, 2566, 750, 2…

The data has 440 rows and 8 columns.

unique(sales$Channel)
## [1] 2 1
unique(sales$Region)
## [1] 3 1 2

Because Channel and Region only has 2 and 3 unique value, we will change their datatype into factor.

sales_clean <- sales %>% 
  mutate_at(c("Channel", "Region"), as.factor)

glimpse(sales_clean)
## Rows: 440
## Columns: 8
## $ Channel          <fct> 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1,…
## $ Region           <fct> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,…
## $ Fresh            <int> 12669, 7057, 6353, 13265, 22615, 9413, 12126, 7579, 5…
## $ Milk             <int> 9656, 9810, 8808, 1196, 5410, 8259, 3199, 4956, 3648,…
## $ Grocery          <int> 7561, 9568, 7684, 4221, 7198, 5126, 6975, 9426, 6192,…
## $ Frozen           <int> 214, 1762, 2405, 6404, 3915, 666, 480, 1669, 425, 115…
## $ Detergents_Paper <int> 2674, 3293, 3516, 507, 1777, 1795, 3140, 3321, 1716, …
## $ Delicassen       <int> 1338, 1776, 7844, 1788, 5185, 1451, 545, 2566, 750, 2…

Only the columns that contain numeric data type that will be used, so I filter the data.

sales_num <- sales_clean %>% 
  select_if(is.numeric)

glimpse(sales_num)
## Rows: 440
## Columns: 6
## $ Fresh            <int> 12669, 7057, 6353, 13265, 22615, 9413, 12126, 7579, 5…
## $ Milk             <int> 9656, 9810, 8808, 1196, 5410, 8259, 3199, 4956, 3648,…
## $ Grocery          <int> 7561, 9568, 7684, 4221, 7198, 5126, 6975, 9426, 6192,…
## $ Frozen           <int> 214, 1762, 2405, 6404, 3915, 666, 480, 1669, 425, 115…
## $ Detergents_Paper <int> 2674, 3293, 3516, 507, 1777, 1795, 3140, 3321, 1716, …
## $ Delicassen       <int> 1338, 1776, 7844, 1788, 5185, 1451, 545, 2566, 750, 2…

The data needs to be normalized to bring them into similar scale.

sales_scaled <- scale(sales_num)

5 K-means Clustering

We need to find the optimum number of cluster (k) to ensure that the clustering results are representative to the underlying patterns of data.

# finding k optimum
fviz_nbclust(
  x = sales_scaled,
  FUNcluster = kmeans,
  method = 'wss'
)

To find the optimum value of k, I will use the Elbow Method. This method involves plotting the within-cluster sum of square (WSS) against different values of k. The optimal number of k is identified at the “elbow” point, where the rate of decrease slows down significantly. This point indicates a good trade-off between compactness (small WSS) and complexity (number of clusters).

From the plot above, we could see that 5 is the optimum number of k, because after k=5, increasing the number of k does not result in a considerable decrease of the total within sum of squares.

After finding the k optimum, I will build the cluster and check the size of each cluster.

RNGkind(sample.kind = "Rounding")
set.seed(100)

# building cluster with k optimum 
sales_cluster <- kmeans(sales_scaled,
                        centers = 5)
# check the size of each cluster
sales_cluster$size
## [1] 256  35  41  12  96

We could see that the size of each cluster is variative. The first cluster represent a large group of customers while the fourth cluster represent the least group of customers.

5.1 Cluster Profiling

Create a new column containing label information from clusters formed using k optimum.

sales_num$cluster <- as.factor(sales_cluster$cluster)
head(sales_num)

5.1.1 Grouping Data based on Cluster Label

Doing grouping based on the clusters formed to find out the characteristics of each cluster.

sales_centroid <- sales_num %>% 
  group_by(cluster) %>% 
  summarise_all(mean)
sales_centroid

5.1.2 Visualize Cluster Profiling

ggRadar(data = sales_num,
        aes(colour = cluster),
        interactive = TRUE)

From the plot above, we could see that customers in cluster 4 has the highest purchase in Detergents Paper, Milk, and Grocery. However cluster 3 has the highest purchase in Fresh.

6 Classification

Before splitting the data, check the number of proportion of each cluster in the data.

prop.table(table(sales_num$cluster))
## 
##          1          2          3          4          5 
## 0.58181818 0.07954545 0.09318182 0.02727273 0.21818182

Split the data into data train and data test with the proportion of 80:20.

RNGkind(sample.kind = "Rounding")
set.seed(123)

index <- sample(x = nrow(sales_num),
                size = nrow(sales_num)*0.8)

sales_train <- sales_num[index, ]
sales_test <- sales_num[-index, ]

Check the proportion of each cluster in the data train.

prop.table(table(sales_train$cluster))
## 
##          1          2          3          4          5 
## 0.55965909 0.08806818 0.10511364 0.03125000 0.21590909

The train data’s percentage of clusters is not significantly different compared to the original data. We could say that the train data is a pretty representative sample of the original data and could capture the main clustering patterns present in the overall dataset.

To perform classification, I will split the predictor variable and label (target variable).

# predictor
sales_train_x <- sales_train %>% 
  select_if(is.numeric)

sales_test_x <- sales_test %>% 
  select_if(is.numeric)

# target
sales_train_y <- sales_train[, "cluster"]
sales_test_y <- sales_test[, "cluster"]

6.1 Model Making

Making cluster prediction using knn model.

cluster_pred <- knn(train = sales_train_x,
                    test = sales_test_x,
                    cl = sales_train_y,
                    k = 5)

Add the prediction into the original dataset.

sales_num$cluster_pred <- cluster_pred
head(sales_num)

6.2 Model Evaluation

confusionMatrix(data = cluster_pred,
                reference = sales_test_y)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  1  2  3  4  5
##          1 58  2  1  0  1
##          2  1  2  0  0  0
##          3  0  0  3  0  0
##          4  0  0  0  0  0
##          5  0  0  0  1 19
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9318          
##                  95% CI : (0.8575, 0.9746)
##     No Information Rate : 0.6705          
##     P-Value [Acc > NIR] : 4.685e-09       
##                                           
##                   Kappa : 0.8558          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity            0.9831  0.50000  0.75000  0.00000   0.9500
## Specificity            0.8621  0.98810  1.00000  1.00000   0.9853
## Pos Pred Value         0.9355  0.66667  1.00000      NaN   0.9500
## Neg Pred Value         0.9615  0.97647  0.98824  0.98864   0.9853
## Prevalence             0.6705  0.04545  0.04545  0.01136   0.2273
## Detection Rate         0.6591  0.02273  0.03409  0.00000   0.2159
## Detection Prevalence   0.7045  0.03409  0.03409  0.00000   0.2273
## Balanced Accuracy      0.9226  0.74405  0.87500  0.50000   0.9676

The accuracy of the model is 0.9318 which is very high. This means that the model is performing well in making correct predictions on the data test. A high accuracy also suggest that the chosen value of k is appropriate for the dataset as it could effectively balancing bias and variance of the data.

7 Conclusion

From the radar plot above, we could see the purchase behavior of customer in each cluster.

  • Cluster 1 purchase a very little amount of Fresh, Milk, Grocery, Frozen, Detergent Paper and Delicassen.

  • Cluster 2 purchase an average amount of Frozen and Fresh type of products.

  • Cluster 3 purchase a high amount of Fresh type of products.

  • Cluster 4 purchase a high amount of Milk, Grocery, and Detergent Paper type of products.

  • Cluster 5 purchase an average amount of Milk, Grocery and Detergent Paper

For Cluster 3 and 4 who has a high amount of purchase in certain type of products, we could perform cross-selling to offer related products or complementary items. We could promote Frozen products to these clusters that may go well with the purchased products to increase sales.

While for Cluster 1, 2 and 5 who purchase average to litte amount of products, we could increase the sales by giving free samples or trials of new products to these customers. If they enjoy the samples given, they may be more inclined to make a larger purchase.

We could also give a limited-time offers or flash sales that create a sense of urgency to customers. This promotion could encourage the customer to buy more products.

In conclusion, the company should perform different type of approach to different type of customers to increase sales. These approach strategy is based on the behavior of the customer.