Diecast Sales in Several Countries using K-Means Cluster

Load all the packages

library(cluster)
library(factoextra)
library(caret)
library(rpart)
library(rpart.plot)
library(imager)
library(dplyr)
library(tidyverse)
library(reshape2)

Data Information

Here we can see the information of business environment ##Data

data<-read.csv("C:/FIDYS/KULIAH/SMT 7/Bisnis Analitik/After ETS/EAS1/sales_data_sample.csv")
df1<-data%>%select(c(quantity, price, tax))
df<-as.data.frame(scale(df1))
head(df)
##      quantity       price        tax
## 1 -0.52279823  0.59687176 -0.5888763
## 2 -0.11218144 -0.11443008 -0.5888763
## 3  0.60639795  0.54928641 -0.5888763
## 4  1.01701475 -0.01975506 -0.5888763
## 5  1.42763154  0.81001447 -0.2975127
## 6  0.09312696  0.64445711 -0.5888763

K-Means Elbow Method

kmean_withinss <- function(k) {
    cluster <- kmeans(df, k, nstart = 500)
    return (cluster$tot.withinss)
}
max_k<-20
wss <- sapply(2:max_k, kmean_withinss)
    
set.seed(123)
fviz_nbclust(df, kmeans, nstart=500, method = "wss")

As we can see from the output above, the cluster are exact 3

Cluster Size

k <- kmeans(df, centers = 3, nstart = 50)
k$size
## [1]  997  488 1338

It indicates that cluster 1 are filled with 997 customers, cluster 2 are filled with 488 customers, and finally cluster 3 with 1338 customers

Boxplot Cluster

m1<-melt(df, id.vars = "cluster")
ggplot(m1, aes(x=cluster, y=value, fill=variable)) + geom_boxplot() + theme_classic() + facet_wrap( ~cluster, scales="free")+ylim(-2,6)

Based on Boxplot Cluster, the characteristic for first cluster is customers that has high demand of quantity buying. And then, cluster 2 is customers that likely got higher tax. Finally, cluster 3 is customers that price of buying is the highest.