Machine Learning - K-means Clustering

PART III: Segment Customers

1. Segment customers into 3 clusters

K-means clustering with 3 clusters of sizes 68, 68, 64

Cluster means:
      Gender        Age     Income SpendingScore
1 -0.8931925 -0.6482839 -0.1137065    0.52072909
2 -0.1552608  1.1171791 -0.2900782   -0.57380427
3  1.1139816 -0.4982011  0.4290213    0.05639239

Clustering vector:
  [1] 3 1 1 1 1 1 2 1 2 1 2 1 2 1 2 3 1 1 2 1 3 1 2 1 2 3 2 3 2 1 2 1 2 3 2
 [36] 1 2 1 2 1 2 3 2 1 2 1 2 1 1 3 2 3 1 2 2 2 2 2 1 2 2 3 2 2 2 3 1 2 3 1
 [71] 2 2 2 2 2 3 1 3 1 2 2 3 2 2 1 3 2 1 1 2 2 3 3 1 1 3 2 1 3 3 3 2 2 3 2
[106] 1 2 2 2 2 2 1 1 3 3 1 2 2 2 2 3 1 1 1 1 1 3 3 2 3 2 3 1 1 3 1 2 3 3 1
[141] 2 3 1 1 3 1 3 1 1 3 3 3 3 1 2 1 3 1 3 1 2 3 3 3 3 1 3 1 3 3 3 3 3 3 2
[176] 1 2 3 2 3 1 3 3 1 1 3 2 3 2 1 1 1 3 1 2 1 3 3 3 3

Within cluster sum of squares by cluster:
[1] 135.6445 169.6290 166.6704
 (between_SS / total_SS =  40.7 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"    
[5] "tot.withinss" "betweenss"    "size"         "iter"        
[9] "ifault"      
[1] 68 68 64
      Gender        Age     Income SpendingScore
1 -0.8931925 -0.6482839 -0.1137065    0.52072909
2 -0.1552608  1.1171791 -0.2900782   -0.57380427
3  1.1139816 -0.4982011  0.4290213    0.05639239

3. Using the Elbow Method to determine new K value

 [1] 796.00000 585.63139 472.10785 379.41309 351.08214 317.41366 234.09948
 [8] 208.22430 168.40494 163.57368 137.32837 147.70344 134.74454 120.69258
[15] 109.38226 107.81616  92.64071  82.83343  91.45414  78.39783

A value of 7 would be our optimal K value.

4. Create a new visualization for the clustering result with K = 7

K-means clustering with 7 clusters of sizes 19, 47, 24, 37, 25, 25, 23

Cluster means:
      Gender        Age     Income SpendingScore
1  1.1139816 -0.4300874  1.0587747     1.2395880
2 -0.8931925  0.8256119 -0.3080906    -0.5037466
3 -0.8931925 -0.4038703  0.9257665     0.9764999
4 -0.8931925 -0.9295675 -0.8610049     0.4088039
5  0.8731207  0.0193285  1.1193722    -1.3228250
6  1.1139816  1.4138442 -0.4705932    -0.4399090
7  1.1139816 -0.9728057 -0.5311804     0.2448055

Clustering vector:
  [1] 7 4 4 4 4 4 2 4 6 4 6 4 2 4 2 7 4 4 6 4 7 4 2 4 2 7 2 7 2 4 6 4 6 7 2
 [36] 4 2 4 2 4 2 7 6 4 2 4 2 4 4 7 2 7 4 6 2 6 2 6 4 6 6 7 2 2 2 7 2 2 7 4
 [71] 6 6 2 2 6 7 2 7 4 2 6 7 6 2 4 6 2 4 4 2 2 7 6 2 4 7 2 4 6 7 7 2 6 7 2
[106] 4 2 6 6 6 6 4 2 7 7 4 2 2 2 2 7 2 3 3 4 3 5 1 6 1 5 1 4 3 5 3 2 1 5 3
[141] 2 1 3 3 5 3 5 3 2 1 5 1 5 3 2 3 5 3 5 3 2 1 5 1 5 3 5 3 5 1 5 1 5 1 2
[176] 3 5 1 5 1 3 1 5 3 3 1 2 1 5 3 5 3 5 3 5 3 5 1 5 1

Within cluster sum of squares by cluster:
[1] 12.83533 62.54858 19.60042 44.30267 39.09549 26.97428 21.82940
 (between_SS / total_SS =  71.5 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"    
[5] "tot.withinss" "betweenss"    "size"         "iter"        
[9] "ifault"      
      Gender        Age     Income SpendingScore
1  1.1139816 -0.4300874  1.0587747     1.2395880
2 -0.8931925  0.8256119 -0.3080906    -0.5037466
3 -0.8931925 -0.4038703  0.9257665     0.9764999
4 -0.8931925 -0.9295675 -0.8610049     0.4088039
5  0.8731207  0.0193285  1.1193722    -1.3228250
6  1.1139816  1.4138442 -0.4705932    -0.4399090
7  1.1139816 -0.9728057 -0.5311804     0.2448055

PART IV: Interpret the Results

1. Assign label to each cluster

  Group.1 CustomerID Gender      Age   Income SpendingScore Cluster
1       1  164.52632   1.00 32.84211 88368.42      82.21053       1
2       2   81.87234   0.00 50.38298 52468.09      37.19149       2
3       3  159.33333   0.00 33.20833 84875.00      75.41667       3
4       4   49.62162   0.00 25.86486 37945.95      60.75676       4
5       5  166.20000   0.88 39.12000 89960.00      16.04000       5
6       6   70.60000   1.00 58.60000 48200.00      38.84000       6
7       7   67.21739   1.00 25.26087 46608.70      56.52174       7

Cluster 1: Younger men. Make good money. Spend a lot. Cluster 2: Middle-aged women. Make average money. Spend little. Cluster 3: Younger women. Make good money. Spend a lot. Cluster 4: Young women. Make little money. Spend average amount. Cluster 5: Middle-aged (mostly men). Make good money. Spend VERY little. Cluster 6: Older men. Make average money. Spend little. Cluster 7: Young men. Make average money. Spend average amount.

2. How does the average age and gender distribution for each cluster compare to that of the overall data set?

[1] 38.85
[1] 0.445
  Group.1 CustomerID Gender      Age   Income SpendingScore Cluster
1       1  164.52632   1.00 32.84211 88368.42      82.21053       1
2       2   81.87234   0.00 50.38298 52468.09      37.19149       2
3       3  159.33333   0.00 33.20833 84875.00      75.41667       3
4       4   49.62162   0.00 25.86486 37945.95      60.75676       4
5       5  166.20000   0.88 39.12000 89960.00      16.04000       5
6       6   70.60000   1.00 58.60000 48200.00      38.84000       6
7       7   67.21739   1.00 25.26087 46608.70      56.52174       7

Cluster 1: Below average in age. Below average in gender distribution (solely men). Cluster 2: Above average in age. Above average in gender distribution (solely women). Cluster 3: Below average in age. Above average in gender distribution (solely women). Cluster 4: Far below average in age. Above average in gender distribution (solely women). Cluster 5: Average in age. Below average in gender distribution (88% men). Cluster 6: Far above average in age. Below average in gender distribution (solely men). Cluster 7: Far below average in age. Below average in gender distribution (solely men).

3. Based on the results of your work, what recommendations would you make to Acme Holdings?

Purely based on spending scores, there are two marketing approaches that Acme Holdings can take. It can focus on leveraging high spenders, or it can focus on improving the spending habits of low spenders. We suggest taking the former, which would entail marketing to clusters 1, 3, 4, and 7. In general, this includes men and women between the ages of 25 and 33. Assuming that people between these ages are buying homes and starting families, Acme Holdings should consider home-improvement and family-starting marketing campaigns.

Elena Wenpei Huang

Due 11/20/2019