Part A - Hierachical clustering

A2 Sub-dataset + Euclidean distance matrix

Yes the data need to be scaled because, annual_income and spending_score are on different ranges, so without scaling the income variable would dominate the Euclidean distances and your clusters would be driven mostly by income.

A3 Dendrogram and Heatmap

A4

Yes, this does provide evidence of clustering. We see a large vertical merge at higher heights. There are clear branches splitting the data into groups and the biggest jump in height occurs before the final merge.

A5 5-Cluster solution

Silhouette of 200 units in 5 clusters from silhouette.default(x = clusters1, dist = d1) :
 Cluster sizes and average silhouette widths:
       23        21        79        39        38 
0.5065849 0.6199558 0.6140651 0.5094809 0.4623933 
Individual silhouette widths:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.1401  0.4993  0.5965  0.5531  0.6610  0.7645 

The overall cluster analysis has a mean Sihouette Score of 0.55, which suggests clear separation, not perfect but strong enough to jutsify 5 segment.

A6 Scatterplot

This shows the 5 distinct customer segments based on income and spending behaviour. The clear separation supports the hierarchical clustering results and demonstrates true segmentation within the dataset.

Table

Customer Profile Summary by Cluster
Cluster # Customer Avg Income (€000s) Avg Spending Score Avg Age
C1 23 26.30 20.91 45.2
C2 21 25.10 80.05 25.3
C3 79 54.42 50.22 42.9
C4 39 86.54 82.13 32.7
C5 38 87.00 18.63 40.4
  • C1: 23 customers avg income = 26.3, avg spend = 20.9, avg age = 45.2

  • C2: 21 customers avg income = 25.1, avg spend = 80, avg age = 25.3

  • C3: 79 customers avg income = 54.4, avg spend = 50.2, avg age = 42.9

  • C4: 39 customers avg income = 86.5, avg spend = 82.1, avg age = 32.1

  • C5: 38 customers avg income = 87, avg spend = 18.6 , avg age = 40.4

Barchart % Males vs Female per Cluster

Warning: The following aesthetics were dropped during statistical transformation: fill.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?
The following aesthetics were dropped during statistical transformation: fill.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?
The following aesthetics were dropped during statistical transformation: fill.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?
The following aesthetics were dropped during statistical transformation: fill.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?
The following aesthetics were dropped during statistical transformation: fill.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

Based on the graph above. It shows a fairly balanced gender split across clusters. With Females taking the lead in all sectors.

8 Summary

  • C1: “Low-Budget Occasional Shoppers” Poor income, spending, and age. They are probably price-sensitive and unengaged.

Action: “€10 dinner deals,” loyalty points on staples, weekly value bundles (essentials) SMS offers are scheduled in relation to paydays.

  • C2: “Young Big Spenders” The youngest cluster has a high expenditure score while having a low income (they spend a lot in comparison to their income).

Action: meal-deal subscriptions, app-only rewards, impulse-friendly promotions (snacks, drinks, and prepared meals), student/young adult benefits, and “spend €X get €Y off next visit.”

  • C3: “Mainstream Regulars” largest group, older average, mid-income + mid-spending. Take action to maintain their loyalty: customised coupons, family-sized packages, “buy more save more,” seasonal promotions, and enhanced convenience (click-and-collect deals).

  • C4: “Premium Power Shoppers” Mid-age, high income plus maximum spending.

Action: premium experiences include wine and cheese deals, upscale product lines, VIP tiers, early access promotions, premium delivery times, and tailored suggestions.

  • C5: “High Income, Low Engagement” They could spend more, but they don’t because they have a high income but a low spending score.

Action: win-back + convenience: highlight quality/freshness, curated baskets, free delivery threshold promotions, targeted “try us” offers, and premium convenience positioning.

Part B K-means clustering

Not all 6 variables are meausred on the same 1-4 agreement scale, so one variable wont overpower another.

4 Run 3 cluster k-means

  [1] 1 2 1 1 1 1 1 2 1 2 2 1 1 2 2 2 1 1 1 1 2 1 2 1 2 1 1 2 1 1 2 2 1 2 1 1 2
 [38] 1 2 1 2 2 1 1 1 1 2 1 2 1 2 1 1 2 2 2 2 2 1 2 1 1 2 1 2 2 2 2 2 1 1 2 1 1
 [75] 1 2 2 2 2 1 1 2 1 1 2 2 1 2 2 1 2 2 1 1 1 2 1 1 1 2 1 1 1 2 2 1 1 2 1 2 2
[112] 2 1 2 2 1 2 1 2 1 2 2 2 1 1 2 2 2 1 2 2 2 1 1 1 1 2 1 2 2 2 2 1 2 3 3 3 3
[149] 2 1 1 1 2 1 1 1 1 1 2 2 2 2 2 1 2 1 1 2 1 1 1 1 2 1 3 3 3 3 3 1 2 2 2 2 1
[186] 2 1 1 2 1 1 1 3 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1 2 2 2 1 2 3 1 2 2 1 2 2
[223] 1 2 2 2 2 2 2 2 1 2 1 2 2 2 1 1 1 2 1 1 1 2 2 2 2 1 1 2 2 1 1 2 1 2 2 2 1
[260] 2 1 3 1 1 2 2 1 2 1 2 2 3 2 2 1 1 3 1 2 2 1 1 1 2 2 2 3 3 3 3 3 3 1 1 2 1
[297] 2 1 2 2 1 1 2 1 1 2 1 2 2 1 2 1 1 1 1 2 2 1 1 2 1 2 2 2 2 2 2 2 1 2 2 1 2
[334] 2 1 1 1 1 1 2 1 3 3 3 3 3 3 3 3 1 1 2 1 1 1 2 1 3 3 3 3 3 3 3 1 1
  pos_impact  environ    money     bins    local avoid_waste
1   3.846154 3.674556 3.928994 3.118343 2.976331    1.337278
2   3.586420 3.481481 3.827160 2.617284 2.969136    3.104938
3   1.800000 1.228571 1.657143 1.114286 2.285714    2.657143
[1] 169 162  35
[1] 3

5 Assess quality

Silhouette of 366 units in 3 clusters from silhouette.default(x = kmeans1$cluster, dist = d1) :
 Cluster sizes and average silhouette widths:
      169       162        35 
0.2682631 0.2054575 0.3271905 
Individual silhouette widths:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.08256  0.14303  0.23963  0.24610  0.35704  0.52470 
  • With an average silhouette of roughly 0.25, this 3-cluster solution is weaker/moderate. This indicates that while there are clusters, there is overlap, which is quite typical for attitude surveys.

6 Profile clusters

Barchart showing %

Recycling Attitudes & Perceptions by Cluster (Mean Scores)
Cluster # Consumers Pos Impact Environment Saves Money Confused by Bins Know Local Centre Avoiding Packaging is Difficult
C1 169 3.85 3.67 3.93 3.12 2.98 1.34
C2 162 3.59 3.48 3.83 2.62 2.97 3.10
C3 35 1.80 1.23 1.66 1.11 2.29 2.66

7 Summary

“Recycling Sceptics” (C1) They don’t really think recycling is beneficial (poor pos impact/environment/money). Simple “why it matters locally” messaging, local proof (before/after, community impact), and simple entry actions (one or two essential items to recycle first) comprise the strategy. Make use of reliable local voices.

“Willing but Unsure” (C2) Although they support recycling, they are not really clear regarding bins. Bin labels, brief videos, “this goes here” graphics, school/community seminars, and uniform labelling across bins are all part of the strategy for the clarity campaign.

C3: “Eco Believers, Need Guidance” Extremely pro-recycling, yet most perplexed by bin regulations (highest bins score). Strategy: sophisticated clarity aids, such as fridge magnets, QR codes on bins, searchable “where does this go?” guidelines, and targeted communications about the most incorrectly sorted things.