library(cluster)
library(factoextra)
library(caret)
library(rpart)
library(rpart.plot)
library(imager)
library(dplyr)
library(tidyverse)
library(reshape2)
Here we can see the information of business environment ##Data
data<-read.csv("C:/FIDYS/KULIAH/SMT 7/Bisnis Analitik/After ETS/EAS1/sales_data_sample.csv")
df1<-data%>%select(c(quantity, price, tax))
df<-as.data.frame(scale(df1))
head(df)
## quantity price tax
## 1 -0.52279823 0.59687176 -0.5888763
## 2 -0.11218144 -0.11443008 -0.5888763
## 3 0.60639795 0.54928641 -0.5888763
## 4 1.01701475 -0.01975506 -0.5888763
## 5 1.42763154 0.81001447 -0.2975127
## 6 0.09312696 0.64445711 -0.5888763
kmean_withinss <- function(k) {
cluster <- kmeans(df, k, nstart = 500)
return (cluster$tot.withinss)
}
max_k<-20
wss <- sapply(2:max_k, kmean_withinss)
set.seed(123)
fviz_nbclust(df, kmeans, nstart=500, method = "wss")
As we can see from the output above, the cluster are exact 3
k <- kmeans(df, centers = 3, nstart = 50)
k$size
## [1] 997 488 1338
It indicates that cluster 1 are filled with 997 customers, cluster 2 are filled with 488 customers, and finally cluster 3 with 1338 customers
m1<-melt(df, id.vars = "cluster")
ggplot(m1, aes(x=cluster, y=value, fill=variable)) + geom_boxplot() + theme_classic() + facet_wrap( ~cluster, scales="free")+ylim(-2,6)
Based on Boxplot Cluster, the characteristic for first cluster is customers that has high demand of quantity buying. And then, cluster 2 is customers that likely got higher tax. Finally, cluster 3 is customers that price of buying is the highest.