Executive Summary

Goal. Provide a clear, non-technical overview of distinct groups in the wine dataset so a business audience (e.g., brand/portfolio managers) can understand what differentiates each group and how to act on it.

Audience. A tech-weak manager who needs defensible insights rather than algorithms.

What this report delivers. - A transparent workflow (libraries → data import → choose k → fit clusters → visualize → interpret). - A recommended number of clusters and profiles that describe each cluster in plain language. - Practical uses and caveats for decision-making.

Get All the Libraries

# Core
library(tidyverse)
library(janitor)
library(skimr)

# Clustering & validation
library(cluster)
library(factoextra)   # viz & helpers
library(NbClust)      # multiple indices (optional; can be slow)

# Tables & viz
library(broom)
library(gt)
library(patchwork)

Data Import and Explore

# Adjust the path if needed; we assume the file sits next to this Rmd
raw <- readr::read_csv("winedata.csv") %>% clean_names()

# Peek
glimpse(raw)
## Rows: 6,497
## Columns: 14
## $ x1                   <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15…
## $ fixed_acidity        <dbl> 7.4, 7.8, 7.8, 11.2, 7.4, 7.4, 7.9, 7.3, 7.8, 7.5…
## $ volatile_acidity     <dbl> 0.700, 0.880, 0.760, 0.280, 0.700, 0.660, 0.600, …
## $ citric_acid          <dbl> 0.00, 0.00, 0.04, 0.56, 0.00, 0.00, 0.06, 0.00, 0…
## $ residual_sugar       <dbl> 1.9, 2.6, 2.3, 1.9, 1.9, 1.8, 1.6, 1.2, 2.0, 6.1,…
## $ chlorides            <dbl> 0.076, 0.098, 0.092, 0.075, 0.076, 0.075, 0.069, …
## $ free_sulfur_dioxide  <dbl> 11, 25, 15, 17, 11, 13, 15, 15, 9, 17, 15, 17, 16…
## $ total_sulfur_dioxide <dbl> 34, 67, 54, 60, 34, 40, 59, 21, 18, 102, 65, 102,…
## $ density              <dbl> 0.9978, 0.9968, 0.9970, 0.9980, 0.9978, 0.9978, 0…
## $ p_h                  <dbl> 3.51, 3.20, 3.26, 3.16, 3.51, 3.51, 3.30, 3.39, 3…
## $ sulphates            <dbl> 0.56, 0.68, 0.65, 0.58, 0.56, 0.56, 0.46, 0.47, 0…
## $ alcohol              <dbl> 9.4, 9.8, 9.8, 9.8, 9.4, 9.4, 9.4, 10.0, 9.5, 10.…
## $ quality              <dbl> 5, 5, 5, 6, 5, 5, 5, 7, 7, 5, 5, 5, 5, 5, 5, 5, 7…
## $ type                 <chr> "red", "red", "red", "red", "red", "red", "red", …
skimr::skim(raw)
Data summary
Name raw
Number of rows 6497
Number of columns 14
_______________________
Column type frequency:
character 1
numeric 13
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
type 0 1 3 5 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
x1 0 1 3249.00 1875.67 1.00 1625.00 3249.00 4873.00 6497.00 ▇▇▇▇▇
fixed_acidity 0 1 7.22 1.30 3.80 6.40 7.00 7.70 15.90 ▂▇▁▁▁
volatile_acidity 0 1 0.34 0.16 0.08 0.23 0.29 0.40 1.58 ▇▂▁▁▁
citric_acid 0 1 0.32 0.15 0.00 0.25 0.31 0.39 1.66 ▇▅▁▁▁
residual_sugar 0 1 5.44 4.76 0.60 1.80 3.00 8.10 65.80 ▇▁▁▁▁
chlorides 0 1 0.06 0.04 0.01 0.04 0.05 0.06 0.61 ▇▁▁▁▁
free_sulfur_dioxide 0 1 30.53 17.75 1.00 17.00 29.00 41.00 289.00 ▇▁▁▁▁
total_sulfur_dioxide 0 1 115.74 56.52 6.00 77.00 118.00 156.00 440.00 ▅▇▂▁▁
density 0 1 0.99 0.00 0.99 0.99 0.99 1.00 1.04 ▇▂▁▁▁
p_h 0 1 3.22 0.16 2.72 3.11 3.21 3.32 4.01 ▁▇▆▁▁
sulphates 0 1 0.53 0.15 0.22 0.43 0.51 0.60 2.00 ▇▃▁▁▁
alcohol 0 1 10.49 1.19 8.00 9.50 10.30 11.30 14.90 ▃▇▅▂▁
quality 0 1 5.82 0.87 3.00 5.00 6.00 6.00 9.00 ▁▆▇▃▁
# Separate potential ID / label columns from numeric features.
# Common label names in the classic wine dataset include 'class' or 'type'.
label_cols <- raw %>% select(where(~!is.numeric(.))) %>% names()

labels <- if (length(label_cols) > 0) raw %>% select(all_of(label_cols)) else NULL

# Numeric-only for clustering
num <- raw %>% select(where(is.numeric))

# Handle missing values (drop rows with NA in features for simplicity)
num <- num %>% drop_na()

# Standardize features for fair distance computation
num_scaled <- num %>% mutate(across(everything(), scale))

Note: We cluster on standardized features to ensure variables measured on different scales contribute equally.

Quick EDA

# Correlation heatmap (pairwise)
corr_mat  <- cor(num, use = "pairwise.complete.obs")
corr_long <- as.data.frame(as.table(corr_mat))
names(corr_long) <- c("Var1", "Var2", "value")

ggplot(corr_long, aes(Var1, Var2, fill = value)) +
  geom_tile() +
  scale_fill_gradient2(limits = c(-1, 1)) +
  coord_fixed() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "Feature Correlation Heatmap", x = NULL, y = NULL, fill = "r")

# Top variable variances
tibble(variable = names(num), variance = map_dbl(num, var)) %>%
  arrange(desc(variance)) %>%
  head(10) %>%
  gt() %>%
  tab_header(title = md("**Top-Variance Features**"))
Top-Variance Features
variable variance
x1 3.518126e+06
total_sulfur_dioxide 3.194720e+03
free_sulfur_dioxide 3.150412e+02
residual_sugar 2.263670e+01
fixed_acidity 1.680740e+00
alcohol 1.422561e+00
quality 7.625748e-01
volatile_acidity 2.710517e-02
p_h 2.585252e-02
sulphates 2.214319e-02

How Many Clusters?

We use three complementary approaches:

  1. Elbow (within-cluster SSE) — looks for diminishing returns.
  2. Average silhouette — prefers tighter, well-separated groups.
  3. Gap statistic / NbClust — robustness check across indices.
p1 <- fviz_nbclust(num_scaled, kmeans, method = "wss", k.max = 10) + ggtitle("Elbow Method")
p2 <- fviz_nbclust(num_scaled, kmeans, method = "silhouette", k.max = 10) + ggtitle("Average Silhouette")
p1 + p2

set.seed(123)
# Speed controls
n_obs    <- nrow(num_scaled)
max_rows <- 800           # sub-sample rows if dataset is large
B_fast   <- 20            # fewer bootstraps than the default 50
X_gap    <- if (n_obs > max_rows) dplyr::slice_sample(num_scaled, n = max_rows) else num_scaled

# Reasonable upper bound for k based on data size and dimensionality
kmax <- max(2, min(10, ncol(X_gap)))

# Fast/robust gap computation; if it errors or takes too long, we skip gracefully
gap <- tryCatch(
  clusGap(as.matrix(X_gap),
          FUN = kmeans,
          nstart = 25,
          K.max = kmax,
          B = B_fast,
          spaceH0 = "scaledPCA"),
  error = function(e) {
    message("Gap statistic skipped: ", e$message)
    NULL
  }
)
if (!is.null(gap)) {
  factoextra::fviz_gap_stat(gap) +
    ggtitle(paste0("Gap Statistic (subsampled; B = ", gap$B, ", kmax = ", nrow(gap$Tab), ")"))
} else {
  plot.new()
  title("Gap Statistic (skipped)")
  mtext("Used elbow + silhouette + (optional) NbClust to choose k.", side = 3, line = 0.5)
}

Decision on k: We combine the elbow, silhouette, and gap. For the classic wine dataset, k = 3 is often supported, but we choose the k that best balances compactness and separation in your specific CSV. We proceed using the chosen k_best below.

# Choose k based on your inspection of the plots above.
# If unsure, set k_best <- 3 (commonly sensible for wine).
k_best <- 3

Let’s Make Clusters

set.seed(1234)
km <- kmeans(num_scaled, centers = k_best, nstart = 50, iter.max = 50)
km
## K-means clustering with 3 clusters of sizes 1604, 1962, 2931
## 
## Cluster means:
##           x1 fixed_acidity volatile_acidity  citric_acid residual_sugar
## 1 -1.2794081     0.8698558        1.1673900 -0.330034429     -0.6134625
## 2  0.3309097    -0.1795612       -0.3394396  0.266846681      1.1358154
## 3  0.4786509    -0.3558340       -0.4116387  0.001986365     -0.4245909
##     chlorides free_sulfur_dioxide total_sulfur_dioxide    density         p_h
## 1  0.91468449         -0.83830632          -1.21545351  0.6967087  0.55452951
## 2 -0.08550749          0.80852466           0.94887336  0.7304700 -0.38111630
## 3 -0.44332590         -0.08245719           0.02998905 -0.8702500 -0.04835044
##    sulphates    alcohol    quality
## 1  0.8402120 -0.1063270 -0.2560702
## 2 -0.2602038 -0.8045438 -0.3015520
## 3 -0.2856295  0.5967463  0.3419930
## 
## Clustering vector:
##    [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##   [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##   [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 3 1 1 1
##  [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [186] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [223] 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [260] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [297] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [334] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [371] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [408] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [445] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [482] 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [519] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [556] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 3
##  [593] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [630] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [667] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [704] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [741] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [778] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [815] 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [852] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [889] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [926] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [963] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1000] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1037] 3 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1074] 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1111] 1 1 1 1 3 1 1 1 1 1 1 1 3 1 1 1 3 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1148] 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1185] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1222] 1 1 1 1 1 1 1 3 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1259] 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1
## [1296] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1333] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1370] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1407] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1444] 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 3 1 1
## [1481] 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1518] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1555] 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1592] 1 1 1 1 1 1 1 1 2 3 3 2 2 3 2 2 3 3 3 3 3 3 2 3 3 3 3 2 3 3 3 1 3 2 3 2 3
## [1629] 3 2 3 3 3 2 3 3 3 2 2 2 2 2 3 3 3 2 2 2 2 3 3 3 3 2 3 2 2 3 3 2 2 3 3 3 3
## [1666] 3 2 3 2 2 2 2 3 3 2 3 3 2 1 3 2 2 2 2 2 2 2 2 2 2 2 3 3 3 2 2 3 2 2 2 2 2
## [1703] 2 2 2 2 2 2 3 2 2 2 2 2 1 3 3 2 2 3 2 2 3 3 3 3 2 3 3 3 2 2 2 2 2 3 2 3 3
## [1740] 3 2 3 3 3 3 3 1 3 3 3 2 3 3 1 2 2 3 3 3 3 2 3 2 2 2 2 3 2 2 3 3 3 3 2 3 3
## [1777] 2 1 2 2 2 2 2 2 2 2 3 3 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 1 3 3 3 2 3
## [1814] 3 2 2 2 2 2 2 2 3 3 3 2 3 2 3 2 2 2 2 2 2 2 2 2 3 2 2 3 3 2 2 3 3 3 3 2 2
## [1851] 2 2 3 3 3 3 3 3 3 3 2 3 2 3 2 2 2 3 3 2 2 2 2 2 2 2 3 3 3 3 3 2 2 2 2 2 2
## [1888] 2 2 2 3 2 3 2 3 2 3 2 3 3 3 3 3 2 2 2 2 3 3 3 3 3 2 2 2 3 2 3 3 3 2 3 2 3
## [1925] 2 2 3 2 3 3 3 3 2 3 3 3 2 3 3 2 3 2 3 3 3 3 2 2 2 3 3 3 3 2 2 2 3 3 3 2 3
## [1962] 3 2 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 2 2 3 3 3 3 2 3 2 2 3 3 3 2 2 3 3 2
## [1999] 3 3 2 3 2 3 2 3 3 3 3 2 2 3 3 2 2 3 2 3 3 3 2 2 2 2 2 2 2 3 2 2 3 2 1 3 3
## [2036] 3 3 2 3 3 3 3 2 2 3 3 2 2 3 2 3 3 3 3 2 3 2 3 2 2 2 2 3 2 2 2 3 2 2 2 2 3
## [2073] 3 3 2 3 3 3 3 2 3 2 2 2 3 3 3 3 2 3 3 2 3 3 3 2 3 3 2 2 2 3 3 2 2 1 3 3 3
## [2110] 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 3 2 2 2 3 2 2 3 2
## [2147] 3 3 2 2 3 3 2 3 3 2 3 2 3 2 3 2 3 2 2 3 2 2 2 2 3 2 3 3 3 3 3 3 3 2 3 2 2
## [2184] 3 2 2 3 3 2 2 3 2 3 3 3 2 3 3 3 2 3 3 3 3 3 2 2 2 1 3 3 2 2 2 2 3 3 3 3 2
## [2221] 2 3 3 3 2 2 2 2 3 2 2 3 2 3 3 3 2 2 2 3 2 2 2 2 2 3 2 2 2 2 2 2 2 3 3 3 3
## [2258] 3 2 3 3 2 3 2 3 3 2 3 2 2 3 3 3 2 2 2 3 3 3 3 3 2 2 3 2 3 1 2 3 3 2 2 2 2
## [2295] 2 3 2 2 2 2 3 3 3 3 3 2 3 2 3 2 3 3 2 3 3 2 2 3 3 2 3 3 3 3 3 3 2 3 1 2 2
## [2332] 3 2 2 3 2 2 3 3 3 3 3 2 2 3 2 3 2 3 3 2 2 2 3 3 2 2 3 3 2 2 2 2 2 3 2 3 2
## [2369] 2 3 3 2 2 3 3 3 2 2 2 3 2 2 2 2 2 2 2 3 2 2 3 2 3 2 2 2 2 3 3 2 2 2 2 3 2
## [2406] 2 2 2 2 2 3 3 2 2 3 3 2 3 2 3 2 2 3 3 2 2 3 3 3 1 3 3 3 1 3 3 3 3 3 3 2 2
## [2443] 3 3 3 2 3 2 3 3 2 3 2 3 3 2 2 2 2 3 2 3 3 3 3 3 2 2 3 2 2 3 3 3 3 3 3 3 3
## [2480] 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 2 3 3 2 3 2 2 2 3 3 1 3 2 2 3 3 3 3 1 3 3
## [2517] 3 3 3 2 2 2 2 3 3 3 3 2 2 3 2 2 2 2 2 3 2 2 2 2 2 3 3 2 3 2 3 1 3 3 2 3 3
## [2554] 2 3 3 3 3 2 2 3 2 3 2 3 3 2 3 3 3 3 2 3 3 2 3 2 3 3 3 3 3 3 3 3 2 2 3 3 3
## [2591] 1 2 3 3 3 3 2 2 3 3 2 2 3 3 3 3 3 3 3 3 3 3 2 2 3 2 3 3 2 3 3 3 2 3 3 3 1
## [2628] 3 2 3 2 2 2 2 3 1 1 3 3 1 3 3 2 3 3 3 3 3 3 2 2 3 1 3 3 3 2 3 2 3 2 2 2 3
## [2665] 2 2 3 3 3 3 2 3 2 2 3 2 3 2 2 3 2 2 2 3 2 3 3 2 3 2 2 3 3 2 3 2 3 2 3 2 3
## [2702] 3 3 2 3 3 3 3 2 3 3 2 3 1 3 3 2 3 2 2 3 3 3 3 2 3 3 3 3 3 3 2 3 3 2 3 3 2
## [2739] 3 3 2 2 3 3 2 3 3 3 2 2 2 1 3 2 3 2 2 2 2 3 2 3 2 3 3 3 3 3 3 3 2 3 3 2 2
## [2776] 2 3 2 2 2 2 3 3 3 2 3 3 3 3 3 3 2 2 2 2 3 2 2 3 3 3 2 3 3 2 2 2 3 3 3 2 3
## [2813] 3 3 2 3 2 3 3 3 3 3 2 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 2 2 3 3 3 3 2 3 3 3 2
## [2850] 3 3 3 3 2 3 3 2 2 2 2 3 3 2 3 3 3 2 2 2 2 3 2 3 2 3 2 2 3 3 3 3 2 3 3 3 3
## [2887] 3 3 3 3 2 3 3 3 2 3 3 3 3 2 2 2 2 2 3 3 2 3 3 2 3 3 2 2 2 3 3 3 2 3 3 3 3
## [2924] 3 3 2 3 3 3 3 2 2 3 3 3 3 2 2 2 3 3 2 2 3 3 3 3 3 3 3 3 3 2 2 3 2 2 3 3 3
## [2961] 3 3 3 3 3 3 2 2 2 3 3 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 2
## [2998] 2 3 2 2 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 2 3 3 2 3 3 2 3 3 3 3 3 3 3 3 3 3 3
## [3035] 2 2 3 2 2 2 3 3 3 2 3 3 3 3 3 2 3 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2
## [3072] 3 3 3 3 2 3 3 3 3 3 3 3 2 3 3 2 2 2 3 3 2 3 3 3 1 3 3 3 3 3 3 3 3 3 2 2 2
## [3109] 3 3 3 3 2 3 3 2 2 3 3 3 3 3 2 2 2 3 2 3 2 2 3 2 3 2 3 3 3 3 3 2 3 3 3 3 3
## [3146] 2 2 3 3 2 2 3 3 2 3 2 2 2 3 3 3 3 3 3 3 2 3 2 2 2 2 2 3 2 2 3 1 2 2 2 3 3
## [3183] 2 2 2 2 2 2 3 3 2 3 3 2 2 3 2 3 3 2 2 3 3 3 3 3 3 2 2 3 3 3 3 2 3 2 3 2 3
## [3220] 3 2 2 3 3 2 2 2 3 2 3 3 3 3 2 3 2 3 2 3 2 2 2 2 2 2 2 3 3 2 3 2 3 2 2 3 3
## [3257] 2 2 2 2 2 3 2 2 3 3 3 3 3 2 3 2 2 2 2 3 3 3 3 2 2 2 2 2 2 2 2 2 3 2 2 3 2
## [3294] 2 3 2 3 3 3 3 2 2 2 2 3 3 2 1 2 3 3 3 3 2 3 3 3 2 3 3 3 2 3 3 2 3 2 2 3 3
## [3331] 2 3 3 2 3 3 2 3 3 2 3 2 2 2 2 3 2 3 3 2 3 2 3 3 2 2 2 3 2 2 3 3 2 2 2 2 2
## [3368] 3 3 2 3 2 2 3 2 2 3 3 3 2 3 2 1 3 2 3 3 3 3 2 3 3 2 3 2 3 3 2 2 2 3 2 2 2
## [3405] 2 3 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 3 2 2 2 2 2 2 2 2 3 2 2 3 2 2 3 3
## [3442] 2 2 3 2 2 2 2 3 3 3 3 3 3 2 1 3 3 2 3 2 3 2 3 1 2 2 3 2 3 3 2 2 2 2 3 3 2
## [3479] 2 2 2 2 2 2 2 3 2 2 3 2 2 2 2 2 2 2 2 2 3 2 3 2 3 2 2 3 3 3 2 2 3 3 3 3 3
## [3516] 2 3 2 2 2 2 2 3 3 3 3 3 3 2 2 2 1 2 3 2 2 3 3 2 2 2 2 2 2 2 2 2 3 2 2 1 3
## [3553] 2 2 2 3 2 2 3 3 2 3 2 2 2 3 2 3 3 3 3 2 2 2 2 2 2 3 2 2 2 2 2 2 2 3 2 2 3
## [3590] 3 2 3 3 2 2 2 2 2 2 3 3 3 3 3 2 2 3 3 3 3 3 3 2 3 3 3 3 3 2 3 3 2 3 2 2 1
## [3627] 2 2 2 3 2 3 2 3 3 3 2 3 3 2 3 3 3 2 3 2 2 2 2 2 2 2 3 3 3 2 3 2 2 2 3 3 3
## [3664] 2 3 3 3 3 3 3 3 2 2 2 3 3 3 3 2 3 2 2 2 3 3 3 2 3 3 2 2 3 2 2 2 2 2 2 3 2
## [3701] 3 2 3 3 2 2 2 2 2 2 2 2 3 2 2 3 3 3 3 2 3 2 2 3 2 2 2 3 3 3 2 2 3 3 2 3 2
## [3738] 3 2 2 2 2 3 2 2 3 2 2 3 3 2 3 3 2 2 3 3 3 3 3 3 1 3 2 3 3 2 2 2 2 2 2 2 2
## [3775] 2 2 3 2 3 2 3 2 2 2 2 1 3 2 2 3 2 2 2 2 3 3 3 2 2 2 3 2 3 3 2 2 3 3 3 3 3
## [3812] 3 3 3 2 3 3 3 3 3 2 3 2 3 2 2 2 2 3 2 3 3 3 3 2 2 2 3 2 2 2 2 2 2 3 2 3 3
## [3849] 2 2 3 2 2 3 3 3 2 2 2 2 3 3 2 2 2 2 2 2 2 3 2 3 3 3 2 2 3 2 2 3 3 2 2 2 2
## [3886] 2 2 2 3 3 3 3 3 3 2 3 2 3 3 3 2 2 2 3 3 3 2 2 3 2 3 2 3 3 2 3 2 2 3 3 3 3
## [3923] 3 2 3 2 3 2 2 2 3 2 3 2 2 2 2 2 3 3 2 3 3 3 2 2 3 3 2 2 2 3 3 3 3 3 3 2 2
## [3960] 2 3 3 2 2 3 2 2 2 3 3 2 3 3 3 3 3 3 2 2 2 3 3 3 3 2 3 3 3 3 3 2 3 3 2 2 2
## [3997] 3 3 3 3 3 2 3 2 2 3 3 2 3 2 2 2 3 2 3 2 2 3 2 3 2 2 3 2 3 3 2 3 2 2 2 2 2
## [4034] 2 2 2 3 2 2 2 3 2 2 2 2 2 2 3 3 2 2 3 3 2 2 2 2 2 2 2 3 3 2 3 2 2 3 3 3 2
## [4071] 3 3 2 3 1 2 3 2 2 3 2 2 2 2 2 2 3 3 2 3 2 2 2 2 3 2 2 2 3 2 2 3 3 2 2 2 3
## [4108] 2 2 2 2 3 3 3 3 3 2 3 2 2 3 3 3 2 3 3 3 3 2 3 2 2 2 3 3 2 2 2 3 2 3 2 2 3
## [4145] 3 2 3 2 2 3 2 2 3 3 3 2 3 3 3 3 2 3 3 3 2 3 3 2 3 3 3 3 2 3 2 2 2 3 2 2 2
## [4182] 2 2 2 2 2 3 2 1 2 3 3 2 3 3 2 3 2 2 3 2 3 3 3 3 3 2 2 3 2 2 3 2 3 2 2 2 3
## [4219] 2 2 2 3 2 3 2 3 3 2 3 3 3 2 2 2 3 2 2 3 3 3 2 3 3 2 3 3 3 2 2 2 3 3 2 2 2
## [4256] 2 3 2 3 2 3 3 3 3 2 3 3 1 2 3 3 2 3 3 3 3 3 3 2 3 2 3 3 3 3 3 2 2 2 3 3 3
## [4293] 3 3 3 2 3 3 3 3 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 3 3 3 2 3 3 3 2 3 3 2 3
## [4330] 2 2 3 3 2 2 3 3 2 3 2 2 2 3 3 3 3 2 3 2 3 3 3 3 3 2 2 3 3 3 3 2 3 3 2 3 3
## [4367] 2 3 3 2 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 3 2 2 2 2 2 2 3 2 3 3 2 3 3 2 2 3 3
## [4404] 3 3 2 2 2 3 3 3 3 3 3 3 3 3 2 3 2 2 2 3 2 3 2 3 3 2 2 2 3 3 3 3 2 2 3 3 3
## [4441] 3 3 3 3 3 3 3 3 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2
## [4478] 3 3 3 3 2 3 3 3 3 3 3 3 3 3 2 3 3 3 2 2 3 2 3 3 2 3 3 3 2 3 3 3 2 2 3 2 3
## [4515] 3 3 3 3 2 2 3 3 3 2 2 3 3 3 2 3 3 2 3 3 3 3 2 3 3 3 3 2 3 3 3 3 3 2 2 3 3
## [4552] 3 3 3 3 3 3 3 3 2 3 2 3 3 3 2 2 3 3 3 3 3 2 2 3 3 2 2 3 3 3 2 3 3 3 3 3 3
## [4589] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 3 3 2 2 3 2 3 3 3 3 3 3 2 2 3
## [4626] 3 2 3 3 3 2 3 2 2 3 3 2 3 2 2 2 2 2 2 3 3 3 3 2 2 2 2 3 3 3 3 3 2 3 2 2 3
## [4663] 2 2 2 2 3 3 3 3 3 2 3 2 3 2 2 3 3 3 2 3 3 3 3 3 2 3 3 3 2 3 3 3 3 3 3 3 3
## [4700] 3 3 3 3 3 3 3 3 2 2 3 3 3 2 3 3 3 3 3 3 2 3 3 3 2 2 3 3 3 3 3 2 3 3 3 2 3
## [4737] 3 3 2 2 3 3 3 2 2 2 2 2 2 3 2 3 3 3 3 2 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3
## [4774] 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 2 2
## [4811] 2 3 2 3 3 3 2 3 3 3 3 3 2 3 3 3 2 2 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [4848] 3 3 3 2 3 3 3 2 2 2 2 2 2 3 2 3 2 2 3 3 3 2 3 3 3 3 3 3 3 3 2 3 3 3 3 2 3
## [4885] 3 3 3 2 2 3 3 2 3 3 2 2 2 3 3 3 3 3 3 3 3 3 2 2 3 3 3 3 3 2 3 3 3 3 3 2 3
## [4922] 3 3 3 3 3 3 3 3 2 2 3 3 3 2 2 2 3 3 3 3 3 2 2 2 2 3 3 2 3 3 3 3 3 3 2 3 3
## [4959] 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 2 3 3 3 3 2 3 3 2 3 3 3 3 3 3 2 2
## [4996] 2 3 2 3 3 3 3 3 3 3 3 2 2 3 3 3 2 2 3 3 2 3 2 3 2 3 3 2 3 2 2 2 3 2 2 2 3
## [5033] 3 3 3 3 2 2 2 3 3 3 3 2 3 2 2 3 3 3 3 2 3 3 3 3 3 3 2 2 3 3 3 2 3 3 2 3 2
## [5070] 3 2 3 3 3 2 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 2 3 3 3 3
## [5107] 3 2 2 2 3 3 3 3 3 3 3 3 3 2 3 2 2 3 3 3 2 3 3 2 2 3 2 2 2 3 3 3 3 3 3 2 3
## [5144] 3 2 2 2 3 3 3 2 2 2 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 2 3 3 3 3
## [5181] 3 3 3 3 3 3 2 2 2 3 3 3 3 3 3 3 2 3 2 2 3 3 3 3 2 2 3 2 2 2 3 3 2 2 3 2 3
## [5218] 2 2 2 2 3 2 3 2 3 2 2 2 3 3 2 3 3 3 3 3 3 3 3 2 2 3 3 3 3 3 3 3 3 2 3 2 3
## [5255] 3 3 2 3 3 3 3 3 3 3 3 2 3 3 3 2 3 3 3 3 3 3 3 2 3 2 3 3 2 3 2 2 2 3 3 3 3
## [5292] 3 3 2 3 3 2 3 2 3 2 2 2 2 2 3 3 2 3 3 2 2 2 3 2 3 3 2 2 3 3 3 3 3 3 3 3 3
## [5329] 3 2 2 3 3 2 3 3 2 3 2 2 2 2 2 2 2 3 2 2 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2
## [5366] 2 2 2 2 3 2 2 3 2 3 3 3 3 3 2 3 3 2 2 3 2 2 2 2 2 2 2 2 3 2 2 2 3 3 3 3 3
## [5403] 3 3 3 2 3 3 3 2 3 3 3 2 3 2 3 3 3 3 2 3 2 3 3 3 3 3 3 3 3 3 2 2 2 3 3 2 2
## [5440] 3 2 3 3 3 2 3 2 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 3 3 3 2 2 2 2 2 2 2 2 3 2
## [5477] 3 2 3 2 2 3 3 2 3 3 3 2 2 3 3 2 3 3 3 2 2 3 2 3 3 3 3 3 3 3 3 3 3 3 2 3 3
## [5514] 3 3 3 2 3 3 3 2 3 3 3 3 2 3 2 2 3 3 3 3 3 2 3 2 3 3 2 3 3 3 3 3 3 3 3 2 3
## [5551] 2 3 3 2 3 3 3 3 3 2 2 3 3 2 3 3 2 2 3 2 3 3 2 2 3 3 3 3 3 2 3 3 3 3 2 2 2
## [5588] 3 2 2 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 2 3 3 3 3 2 2 3 2 3 2 3 2 3 2 2 3
## [5625] 3 2 3 3 3 3 3 2 3 2 3 3 3 3 3 2 2 2 2 2 2 2 2 2 3 2 3 2 2 2 3 2 3 2 3 3 3
## [5662] 3 3 3 3 3 2 3 2 2 3 2 3 2 2 2 3 3 3 2 2 3 3 3 3 3 3 3 3 2 3 3 2 3 2 3 3 3
## [5699] 3 2 2 2 3 3 3 3 2 3 3 3 3 3 3 3 3 2 2 3 3 2 2 2 3 3 2 2 2 2 3 3 2 2 3 3 3
## [5736] 3 2 3 3 2 2 3 2 2 2 2 3 2 3 2 3 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 2 3 3
## [5773] 2 2 3 2 3 3 2 2 3 2 3 3 2 3 3 3 3 3 3 3 3 3 3 3 2 2 3 3 3 3 3 2 3 2 3 3 3
## [5810] 3 2 2 3 2 2 2 3 3 2 3 3 3 3 3 2 2 3 2 3 3 3 3 3 3 2 2 3 2 3 2 3 3 3 2 2 3
## [5847] 3 3 3 2 3 2 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 3 2 3 2 2 2 2 3
## [5884] 3 3 3 3 3 2 2 2 3 2 3 2 3 3 3 2 2 2 2 3 3 3 2 3 3 3 2 3 3 3 3 3 2 3 3 3 2
## [5921] 2 3 3 3 2 2 2 3 2 2 2 2 2 2 2 2 2 2 3 2 2 3 3 2 3 2 3 2 2 3 3 3 3 3 3 3 3
## [5958] 2 2 3 3 3 2 3 2 3 2 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 2 2 3 3 2 3 2 2 2 2 2
## [5995] 2 2 2 2 2 2 3 3 2 2 2 3 3 2 3 2 3 3 3 3 2 2 3 2 2 2 2 3 3 2 3 3 2 3 3 2 3
## [6032] 3 2 3 3 3 3 3 3 2 3 2 3 3 3 3 3 3 2 2 2 3 3 2 2 2 2 3 3 2 3 3 3 2 2 3 2 3
## [6069] 3 3 3 3 1 2 3 3 2 2 3 2 2 3 3 3 3 3 3 3 3 3 3 3 2 3 2 3 2 3 3 3 2 3 3 3 2
## [6106] 3 3 3 3 3 3 3 3 2 3 3 3 3 3 2 3 3 2 2 3 2 3 3 2 2 2 3 2 3 2 2 3 3 3 3 3 3
## [6143] 3 3 3 3 3 2 3 3 3 3 3 3 2 3 2 3 3 3 3 3 3 3 2 2 2 3 3 3 3 3 3 3 3 3 2 3 2
## [6180] 2 3 3 2 3 3 3 3 3 3 3 2 3 2 3 3 3 3 2 3 3 3 2 3 3 3 3 2 3 3 3 3 2 2 2 3 3
## [6217] 3 3 2 3 3 3 3 3 3 2 3 3 3 2 3 2 3 3 3 3 3 3 2 3 3 3 2 3 3 3 3 2 3 3 3 2 3
## [6254] 2 2 2 2 3 3 3 3 3 3 3 3 2 3 3 3 2 3 3 3 3 3 2 3 2 3 3 3 3 3 3 2 3 2 2 2 2
## [6291] 2 3 2 2 3 3 3 2 2 2 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 3 2 3 2
## [6328] 2 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 2 2 3 2 2 3 2 3 3 2 3 3 3 3 3 3 2 3 3 3
## [6365] 3 3 3 2 2 2 2 2 3 3 3 3 3 2 3 3 3 2 2 2 3 3 3 2 3 3 2 3 2 2 3 3 3 2 3 3 3
## [6402] 3 3 3 3 2 3 3 3 2 3 3 2 3 3 2 3 3 3 2 3 3 3 2 3 3 3 2 3 3 3 3 3 3 3 2 3 3
## [6439] 3 3 3 3 3 3 3 3 3 3 2 3 2 3 3 3 2 2 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3
## [6476] 3 3 3 2 2 2 3 3 2 2 3 3 3 2 3 3 3 3 2 3 3 3
## 
## Within cluster sum of squares by cluster:
## [1] 18604.50 14887.52 21116.02
##  (between_SS / total_SS =  35.3 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"
clustered <- num %>% mutate(cluster = factor(km$cluster))
cluster_sizes <- clustered %>% count(cluster, name = "n")
cluster_sizes %>% gt() %>% tab_header(title = md("**Cluster Sizes**"))
Cluster Sizes
cluster n
1 1604
2 1962
3 2931

Cluster Centers (Standardized and Original Units)

# Standardized centroids (from kmeans)
centers_z <- as_tibble(km$centers, .name_repair = "minimal") %>%
  mutate(cluster = factor(1:n())) %>%
  relocate(cluster)

# Convert standardized centers back to original scale:
num_means <- sapply(num, mean)
num_sds   <- sapply(num, sd)

centers_orig <- centers_z
for (v in names(num)) {
  centers_orig[[v]] <- centers_z[[v]] * num_sds[[v]] + num_means[[v]]
}

centers_z %>% gt() %>% tab_header(title = md("**Cluster Centroids (Z-Scores)**"))
Cluster Centroids (Z-Scores)
cluster x1 fixed_acidity volatile_acidity citric_acid residual_sugar chlorides free_sulfur_dioxide total_sulfur_dioxide density p_h sulphates alcohol quality
1 -1.2794081 0.8698558 1.1673900 -0.330034429 -0.6134625 0.91468449 -0.83830632 -1.21545351 0.6967087 0.55452951 0.8402120 -0.1063270 -0.2560702
2 0.3309097 -0.1795612 -0.3394396 0.266846681 1.1358154 -0.08550749 0.80852466 0.94887336 0.7304700 -0.38111630 -0.2602038 -0.8045438 -0.3015520
3 0.4786509 -0.3558340 -0.4116387 0.001986365 -0.4245909 -0.44332590 -0.08245719 0.02998905 -0.8702500 -0.04835044 -0.2856295 0.5967463 0.3419930
centers_orig %>% gt() %>% tab_header(title = md("**Cluster Centroids (Original Units)**"))
Cluster Centroids (Original Units)
cluster x1 fixed_acidity volatile_acidity citric_acid residual_sugar chlorides free_sulfur_dioxide total_sulfur_dioxide density p_h sulphates alcohol quality
1 849.2569 8.343017 0.5318610 0.2706733 2.524501 0.08807855 15.64589 47.04489 0.9967858 3.307662 0.6562968 10.364983 5.594763
2 3869.6764 6.982518 0.2837819 0.3574108 10.847222 0.05303823 44.87615 169.37666 0.9968871 3.157222 0.4925484 9.532212 5.555046
3 4146.7895 6.753992 0.2718953 0.3189219 3.423115 0.04050256 29.06175 117.43961 0.9920870 3.210727 0.4887649 11.203547 6.117025

Variable Importance by Cluster (Profile Heatmap)

centers_long <- centers_z %>%
  pivot_longer(-cluster, names_to = "feature", values_to = "zscore")

ggplot(centers_long, aes(feature, cluster, fill = zscore)) +
  geom_tile() +
  scale_fill_gradient2(low = "steelblue", mid = "white", high = "firebrick", midpoint = 0) +
  labs(title = "Cluster Profiles (Standardized Centers)",
       x = "Feature", y = "Cluster") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Visualize Clusters

PCA Map (Colored by Cluster)

pca <- prcomp(num_scaled, center = FALSE, scale. = FALSE)
pca_df <- as_tibble(pca$x[, 1:2]) %>%
  mutate(cluster = factor(km$cluster))

ggplot(pca_df, aes(PC1, PC2, color = cluster)) +
  geom_point(alpha = 0.8, size = 2.5) +
  stat_ellipse(type = "t", linetype = 2) +
  labs(title = "PCA Projection Colored by Cluster",
       subtitle = "Ellipses show approximate cluster spread",
       color = "Cluster")

Silhouette Plot

d <- dist(num_scaled)
sil <- silhouette(km$cluster, d)
factoextra::fviz_silhouette(sil) + ggtitle("Silhouette Plot")
##   cluster size ave.sil.width
## 1       1 1604          0.22
## 2       2 1962          0.22
## 3       3 2931          0.22

The silhouette averages (~0.22) indicate modest separation: clusters are somewhat diffuse and overlapping. This suggests the wine chemistry data forms a continuum rather than distinct groupings. For managerial use, these clusters should be interpreted as soft style groupings (‘lighter’, ‘balanced’, ‘richer’) rather than hard categorical classes. # Interpretation — Personas & Business Takeaways

# Rank features by relative importance per cluster (absolute z-score)
top_features <- centers_long %>%
  group_by(cluster) %>%
  slice_max(order_by = abs(zscore), n = 5, with_ties = FALSE) %>%
  arrange(cluster, desc(abs(zscore)))

top_features %>%
  group_by(cluster) %>%
  summarise(top5 = paste0(feature, " (", sprintf("%.2f", zscore), "z)", collapse = ", ")) %>%
  gt() %>%
  tab_header(title = md("**Top-Defining Features (by Cluster)**")) %>%
  cols_label(top5 = "Most Distinctive Signals")
Top-Defining Features (by Cluster)
cluster Most Distinctive Signals
1 x1 (-1.28z), total_sulfur_dioxide (-1.22z), volatile_acidity (1.17z), chlorides (0.91z), fixed_acidity (0.87z)
2 residual_sugar (1.14z), total_sulfur_dioxide (0.95z), free_sulfur_dioxide (0.81z), alcohol (-0.80z), density (0.73z)
3 density (-0.87z), alcohol (0.60z), x1 (0.48z), chlorides (-0.44z), residual_sugar (-0.42z)

How to read this. Positive z-scores = above-average for that feature; negative = below-average. Use this to name clusters (“High-Body & Phenolics”, “Light-Body & Aromatic”, etc.).

Example guidance: - Cluster 1 — “Rich & Structured”: High on phenolic-related features; tends to be fuller-bodied. Implication: Position as a premium, age-worthy style; expect higher willingness-to-pay in certain markets.

  • Cluster 2 — “Bright & Aromatic”: Higher on acidity/aromatic markers; lighter profile. Implication: Emphasize freshness/food-pairing; good for by-the-glass programs.
  • Cluster 3 — “Balanced & Approachable”: Close to average across many features. Implication: Volume play; accessible entry SKU.

Tie these to audience goals (e.g., SKU rationalization, pricing ladders, portfolio storytelling).

Validation & Sensitivity

  • Internal validation: Average silhouette (higher is better), within-cluster SSE, and PCA separation are adequate checks.
  • Sensitivity checks: Try alternative k (e.g., 2 or 4) and confirm that business narratives remain stable. Re-run k-means with different nstart to check stability.
  • Data risks: If labels exist (e.g., cultivar), do they align with clusters? That may indicate we’ve rediscovered underlying classes.
list(
  k_chosen = k_best,
  avg_silhouette = mean(sil[, "sil_width"]),
  tot_withinss = km$tot.withinss,
  betweenss = km$betweenss
)
## $k_chosen
## [1] 3
## 
## $avg_silhouette
## [1] 0.2179186
## 
## $tot_withinss
## [1] 54608.04
## 
## $betweenss
## [1] 29839.96

Conclusion

What we learned. Wines naturally segment into k = 3 groups with distinct feature profiles. These profiles can be named and actioned for portfolio strategy, pricing tiers, and storytelling.

How this helps our audience. Instead of raw chemistry variables, we deliver clear personas that inform go-to-market choices. Managers can: (1) prioritize SKUs that anchor each persona, (2) tailor messaging to the profile, and (3) align pricing with perceived value.