Cluster Analysis

load the libraries
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
read the file “Shopping_cluster.csv” (modify to your own directory)
shop<-read.csv("~/Desktop/R/Shopping_cluster.csv",header=T,sep=',')
select the variables for cluster analysis
shop_slt<-mutate(shop,caseno=NULL) # exclude caseno in the analysis
let us form 3 clusters
set.seed(6)  # fix the initial value
result<-kmeans(shop_slt,4)
result
## K-means clustering with 4 clusters of sizes 1, 6, 5, 8
## 
## Cluster means:
##        Fun Budget Eating_out Best_buys Dont_care Compare_price
## 1 3.000000  7.000   2.000000     6.000     4.000      3.000000
## 2 1.666667  3.000   1.833333     3.500     5.500      3.333333
## 3 3.600000  5.600   3.600000     6.000     3.400      6.600000
## 4 5.750000  3.625   6.000000     3.125     1.875      3.875000
## 
## Clustering vector:
##  [1] 4 2 4 3 2 4 4 4 2 3 2 4 2 3 4 3 4 1 3 2
## 
## Within cluster sum of squares by cluster:
## [1]  0.0 20.5 10.0 34.0
##  (between_SS / total_SS =  80.4 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"
check the average rating of each cluster, and the cluster size
result$centers
##        Fun Budget Eating_out Best_buys Dont_care Compare_price
## 1 3.000000  7.000   2.000000     6.000     4.000      3.000000
## 2 1.666667  3.000   1.833333     3.500     5.500      3.333333
## 3 3.600000  5.600   3.600000     6.000     3.400      6.600000
## 4 5.750000  3.625   6.000000     3.125     1.875      3.875000
result$size
## [1] 1 6 5 8
obtain the group of each customer and merge to the previous data
result$cluster
##  [1] 4 2 4 3 2 4 4 4 2 3 2 4 2 3 4 3 4 1 3 2
shop_new<-cbind(shop,result$cluster)

##Comment:In the 4-cluster solution, the customers are clustered into four segments with 1,6,5,8 in segment1-4,respectively.Cluster 1 has haighest value on V2 and V3 but lowest value on V6, which indicats that cluster1 concerns cost performence and is willing to pay for good products.Cluster2 has low value on V1,V2,V3,V4 and V6 but highest value on V5, which indicates they are not concerned about shopping.Cluster3 has highest value on V6 and V4, they concern cost performence as much as cluster1 but they care more about price. Cluster 4 has highest values on V1 and V3 and lowest value on V5, they care and love shopping and like eating while shopping.