cluster analysis

Cluster Analysis

load the libraries

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

read the file “Shopping_cluster.csv” (modify to your own directory)

shop<-read.csv("~/Desktop/R/Shopping_cluster.csv",header=T,sep=',')

select the variables for cluster analysis

shop_slt<-mutate(shop,caseno=NULL) # exclude caseno in the analysis

let us form 3 clusters

set.seed(6)  # fix the initial value
result<-kmeans(shop_slt,4)
result

## K-means clustering with 4 clusters of sizes 1, 6, 5, 8
## 
## Cluster means:
##        Fun Budget Eating_out Best_buys Dont_care Compare_price
## 1 3.000000  7.000   2.000000     6.000     4.000      3.000000
## 2 1.666667  3.000   1.833333     3.500     5.500      3.333333
## 3 3.600000  5.600   3.600000     6.000     3.400      6.600000
## 4 5.750000  3.625   6.000000     3.125     1.875      3.875000
## 
## Clustering vector:
##  [1] 4 2 4 3 2 4 4 4 2 3 2 4 2 3 4 3 4 1 3 2
## 
## Within cluster sum of squares by cluster:
## [1]  0.0 20.5 10.0 34.0
##  (between_SS / total_SS =  80.4 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

check the average rating of each cluster, and the cluster size

result$centers

##        Fun Budget Eating_out Best_buys Dont_care Compare_price
## 1 3.000000  7.000   2.000000     6.000     4.000      3.000000
## 2 1.666667  3.000   1.833333     3.500     5.500      3.333333
## 3 3.600000  5.600   3.600000     6.000     3.400      6.600000
## 4 5.750000  3.625   6.000000     3.125     1.875      3.875000

result$size

## [1] 1 6 5 8

obtain the group of each customer and merge to the previous data

result$cluster

##  [1] 4 2 4 3 2 4 4 4 2 3 2 4 2 3 4 3 4 1 3 2

shop_new<-cbind(shop,result$cluster)

##Comment:In the 4-cluster solution, the customers are clustered into four segments with 1,6,5,8 in segment1-4,respectively.Cluster 1 has haighest value on V2 and V3 but lowest value on V6, which indicats that cluster1 concerns cost performence and is willing to pay for good products.Cluster2 has low value on V1,V2,V3,V4 and V6 but highest value on V5, which indicates they are not concerned about shopping.Cluster3 has highest value on V6 and V4, they concern cost performence as much as cluster1 but they care more about price. Cluster 4 has highest values on V1 and V3 and lowest value on V5, they care and love shopping and like eating while shopping.

cluster analysis

pengkangjing

5/27/2021