This example was produced as an auxiliary study for the course “Introdução ao Marketing Analítico”, 4th module, by INSPER, Coursera.org.
It is based on a small data set of personal spent (“gastos diários”) (clothing and food). It demonstrates some techniques for Cluster Analysis.
dt <- read.table("Gastos_Diarios.csv", sep=',', header=TRUE, row.names=1)
names(dt) <- c("comida","roupas")
print(head(dt))
## comida roupas
## a 2.0 4
## b 8.0 2
## c 9.0 3
## d 1.0 5
## e 8.5 1
Prepare data - standardization
dts <- na.omit(dt) # listwise deletion of missing
dts <- scale(dts) # standardize variables
print(dts)
## comida roupas
## a -0.9569321 0.6324555
## b 0.5948497 -0.6324555
## c 0.8534800 0.0000000
## d -1.2155624 1.2649111
## e 0.7241648 -1.2649111
## attr(,"scaled:center")
## comida roupas
## 5.7 3.0
## attr(,"scaled:scale")
## comida roupas
## 3.866523 1.581139
Visualizing the Dendrogram
# https://rpubs.com/gaston/dendrograms
hc = hclust(dist(dt)) # prepare hierarchical cluster
plot(hc, xlab="pessoas") # very simple dendrogram
box(which="figure",lty="solid",col="red",bg="yellow")
Establishing number of clusters solution
# http://www.statmethods.net/advstats/cluster.html
wss <- (nrow(dts)-1)*sum(apply(dts,2,var))
# nCenters <- nrow(dts)-1 # (maximum allowed)
nCenters <- 2 # (chosen visually, observing dendrogram image)
Partioning by K-means
for (i in 2:nCenters)
wss[i] <- sum(kmeans(dts, centers=i)$withinss)
plot(1:nCenters, wss, type="b", xlab="Number of Clusters", ylab="Within groups sum of squares")
K-Means Cluster Analysis
fit <- kmeans(dts, nCenters)
# get cluster means
aggregate(dts,by=list(fit$cluster),FUN=mean)
## Group.1 comida roupas
## 1 1 -1.0862473 0.9486833
## 2 2 0.7241648 -0.6324555
# append cluster assignment
dt.cluster <- data.frame(dts, fit$cluster)
## comida roupas fit.cluster
## a -0.9569321 0.6324555 1
## b 0.5948497 -0.6324555 2
## c 0.8534800 0.0000000 2
## d -1.2155624 1.2649111 1
## e 0.7241648 -1.2649111 2
This [R] code is available to see and download from my GitHub: https://github.com/svicente99/cluster_analysis_example