Goal. Provide a clear, non-technical overview of distinct groups in the wine dataset so a business audience (e.g., brand/portfolio managers) can understand what differentiates each group and how to act on it.
Audience. A tech-weak manager who needs defensible insights rather than algorithms.
What this report delivers. - A transparent workflow (libraries → data import → choose k → fit clusters → visualize → interpret). - A recommended number of clusters and profiles that describe each cluster in plain language. - Practical uses and caveats for decision-making.
# Core
library(tidyverse)
library(janitor)
library(skimr)
# Clustering & validation
library(cluster)
library(factoextra) # viz & helpers
library(NbClust) # multiple indices (optional; can be slow)
# Tables & viz
library(broom)
library(gt)
library(patchwork)
# Adjust the path if needed; we assume the file sits next to this Rmd
raw <- readr::read_csv("winedata.csv") %>% clean_names()
# Peek
glimpse(raw)
## Rows: 6,497
## Columns: 14
## $ x1 <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15…
## $ fixed_acidity <dbl> 7.4, 7.8, 7.8, 11.2, 7.4, 7.4, 7.9, 7.3, 7.8, 7.5…
## $ volatile_acidity <dbl> 0.700, 0.880, 0.760, 0.280, 0.700, 0.660, 0.600, …
## $ citric_acid <dbl> 0.00, 0.00, 0.04, 0.56, 0.00, 0.00, 0.06, 0.00, 0…
## $ residual_sugar <dbl> 1.9, 2.6, 2.3, 1.9, 1.9, 1.8, 1.6, 1.2, 2.0, 6.1,…
## $ chlorides <dbl> 0.076, 0.098, 0.092, 0.075, 0.076, 0.075, 0.069, …
## $ free_sulfur_dioxide <dbl> 11, 25, 15, 17, 11, 13, 15, 15, 9, 17, 15, 17, 16…
## $ total_sulfur_dioxide <dbl> 34, 67, 54, 60, 34, 40, 59, 21, 18, 102, 65, 102,…
## $ density <dbl> 0.9978, 0.9968, 0.9970, 0.9980, 0.9978, 0.9978, 0…
## $ p_h <dbl> 3.51, 3.20, 3.26, 3.16, 3.51, 3.51, 3.30, 3.39, 3…
## $ sulphates <dbl> 0.56, 0.68, 0.65, 0.58, 0.56, 0.56, 0.46, 0.47, 0…
## $ alcohol <dbl> 9.4, 9.8, 9.8, 9.8, 9.4, 9.4, 9.4, 10.0, 9.5, 10.…
## $ quality <dbl> 5, 5, 5, 6, 5, 5, 5, 7, 7, 5, 5, 5, 5, 5, 5, 5, 7…
## $ type <chr> "red", "red", "red", "red", "red", "red", "red", …
skimr::skim(raw)
| Name | raw |
| Number of rows | 6497 |
| Number of columns | 14 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 13 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| type | 0 | 1 | 3 | 5 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| x1 | 0 | 1 | 3249.00 | 1875.67 | 1.00 | 1625.00 | 3249.00 | 4873.00 | 6497.00 | ▇▇▇▇▇ |
| fixed_acidity | 0 | 1 | 7.22 | 1.30 | 3.80 | 6.40 | 7.00 | 7.70 | 15.90 | ▂▇▁▁▁ |
| volatile_acidity | 0 | 1 | 0.34 | 0.16 | 0.08 | 0.23 | 0.29 | 0.40 | 1.58 | ▇▂▁▁▁ |
| citric_acid | 0 | 1 | 0.32 | 0.15 | 0.00 | 0.25 | 0.31 | 0.39 | 1.66 | ▇▅▁▁▁ |
| residual_sugar | 0 | 1 | 5.44 | 4.76 | 0.60 | 1.80 | 3.00 | 8.10 | 65.80 | ▇▁▁▁▁ |
| chlorides | 0 | 1 | 0.06 | 0.04 | 0.01 | 0.04 | 0.05 | 0.06 | 0.61 | ▇▁▁▁▁ |
| free_sulfur_dioxide | 0 | 1 | 30.53 | 17.75 | 1.00 | 17.00 | 29.00 | 41.00 | 289.00 | ▇▁▁▁▁ |
| total_sulfur_dioxide | 0 | 1 | 115.74 | 56.52 | 6.00 | 77.00 | 118.00 | 156.00 | 440.00 | ▅▇▂▁▁ |
| density | 0 | 1 | 0.99 | 0.00 | 0.99 | 0.99 | 0.99 | 1.00 | 1.04 | ▇▂▁▁▁ |
| p_h | 0 | 1 | 3.22 | 0.16 | 2.72 | 3.11 | 3.21 | 3.32 | 4.01 | ▁▇▆▁▁ |
| sulphates | 0 | 1 | 0.53 | 0.15 | 0.22 | 0.43 | 0.51 | 0.60 | 2.00 | ▇▃▁▁▁ |
| alcohol | 0 | 1 | 10.49 | 1.19 | 8.00 | 9.50 | 10.30 | 11.30 | 14.90 | ▃▇▅▂▁ |
| quality | 0 | 1 | 5.82 | 0.87 | 3.00 | 5.00 | 6.00 | 6.00 | 9.00 | ▁▆▇▃▁ |
# Separate potential ID / label columns from numeric features.
# Common label names in the classic wine dataset include 'class' or 'type'.
label_cols <- raw %>% select(where(~!is.numeric(.))) %>% names()
labels <- if (length(label_cols) > 0) raw %>% select(all_of(label_cols)) else NULL
# Numeric-only for clustering
num <- raw %>% select(where(is.numeric))
# Handle missing values (drop rows with NA in features for simplicity)
num <- num %>% drop_na()
# Standardize features for fair distance computation
num_scaled <- num %>% mutate(across(everything(), scale))
Note: We cluster on standardized features to ensure variables measured on different scales contribute equally.
# Correlation heatmap (pairwise)
corr_mat <- cor(num, use = "pairwise.complete.obs")
corr_long <- as.data.frame(as.table(corr_mat))
names(corr_long) <- c("Var1", "Var2", "value")
ggplot(corr_long, aes(Var1, Var2, fill = value)) +
geom_tile() +
scale_fill_gradient2(limits = c(-1, 1)) +
coord_fixed() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Feature Correlation Heatmap", x = NULL, y = NULL, fill = "r")
# Top variable variances
tibble(variable = names(num), variance = map_dbl(num, var)) %>%
arrange(desc(variance)) %>%
head(10) %>%
gt() %>%
tab_header(title = md("**Top-Variance Features**"))
| Top-Variance Features | |
| variable | variance |
|---|---|
| x1 | 3.518126e+06 |
| total_sulfur_dioxide | 3.194720e+03 |
| free_sulfur_dioxide | 3.150412e+02 |
| residual_sugar | 2.263670e+01 |
| fixed_acidity | 1.680740e+00 |
| alcohol | 1.422561e+00 |
| quality | 7.625748e-01 |
| volatile_acidity | 2.710517e-02 |
| p_h | 2.585252e-02 |
| sulphates | 2.214319e-02 |
We use three complementary approaches:
p1 <- fviz_nbclust(num_scaled, kmeans, method = "wss", k.max = 10) + ggtitle("Elbow Method")
p2 <- fviz_nbclust(num_scaled, kmeans, method = "silhouette", k.max = 10) + ggtitle("Average Silhouette")
p1 + p2
set.seed(123)
# Speed controls
n_obs <- nrow(num_scaled)
max_rows <- 800 # sub-sample rows if dataset is large
B_fast <- 20 # fewer bootstraps than the default 50
X_gap <- if (n_obs > max_rows) dplyr::slice_sample(num_scaled, n = max_rows) else num_scaled
# Reasonable upper bound for k based on data size and dimensionality
kmax <- max(2, min(10, ncol(X_gap)))
# Fast/robust gap computation; if it errors or takes too long, we skip gracefully
gap <- tryCatch(
clusGap(as.matrix(X_gap),
FUN = kmeans,
nstart = 25,
K.max = kmax,
B = B_fast,
spaceH0 = "scaledPCA"),
error = function(e) {
message("Gap statistic skipped: ", e$message)
NULL
}
)
if (!is.null(gap)) {
factoextra::fviz_gap_stat(gap) +
ggtitle(paste0("Gap Statistic (subsampled; B = ", gap$B, ", kmax = ", nrow(gap$Tab), ")"))
} else {
plot.new()
title("Gap Statistic (skipped)")
mtext("Used elbow + silhouette + (optional) NbClust to choose k.", side = 3, line = 0.5)
}
Decision on k: We combine the elbow, silhouette, and gap. For the classic wine dataset, k = 3 is often supported, but we choose the k that best balances compactness and separation in your specific CSV. We proceed using the chosen
k_bestbelow.
# Choose k based on your inspection of the plots above.
# If unsure, set k_best <- 3 (commonly sensible for wine).
k_best <- 3
set.seed(1234)
km <- kmeans(num_scaled, centers = k_best, nstart = 50, iter.max = 50)
km
## K-means clustering with 3 clusters of sizes 1604, 1962, 2931
##
## Cluster means:
## x1 fixed_acidity volatile_acidity citric_acid residual_sugar
## 1 -1.2794081 0.8698558 1.1673900 -0.330034429 -0.6134625
## 2 0.3309097 -0.1795612 -0.3394396 0.266846681 1.1358154
## 3 0.4786509 -0.3558340 -0.4116387 0.001986365 -0.4245909
## chlorides free_sulfur_dioxide total_sulfur_dioxide density p_h
## 1 0.91468449 -0.83830632 -1.21545351 0.6967087 0.55452951
## 2 -0.08550749 0.80852466 0.94887336 0.7304700 -0.38111630
## 3 -0.44332590 -0.08245719 0.02998905 -0.8702500 -0.04835044
## sulphates alcohol quality
## 1 0.8402120 -0.1063270 -0.2560702
## 2 -0.2602038 -0.8045438 -0.3015520
## 3 -0.2856295 0.5967463 0.3419930
##
## Clustering vector:
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 3 1 1 1
## [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [186] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [223] 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [260] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [297] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [334] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [371] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [408] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [445] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [482] 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [519] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [556] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 3
## [593] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [630] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [667] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [704] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [741] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [778] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [815] 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1
## [852] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [889] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [926] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [963] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1000] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1037] 3 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1074] 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1111] 1 1 1 1 3 1 1 1 1 1 1 1 3 1 1 1 3 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1148] 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1185] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1222] 1 1 1 1 1 1 1 3 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1259] 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1
## [1296] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1333] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1370] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1407] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1444] 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 3 1 1
## [1481] 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1518] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1555] 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1592] 1 1 1 1 1 1 1 1 2 3 3 2 2 3 2 2 3 3 3 3 3 3 2 3 3 3 3 2 3 3 3 1 3 2 3 2 3
## [1629] 3 2 3 3 3 2 3 3 3 2 2 2 2 2 3 3 3 2 2 2 2 3 3 3 3 2 3 2 2 3 3 2 2 3 3 3 3
## [1666] 3 2 3 2 2 2 2 3 3 2 3 3 2 1 3 2 2 2 2 2 2 2 2 2 2 2 3 3 3 2 2 3 2 2 2 2 2
## [1703] 2 2 2 2 2 2 3 2 2 2 2 2 1 3 3 2 2 3 2 2 3 3 3 3 2 3 3 3 2 2 2 2 2 3 2 3 3
## [1740] 3 2 3 3 3 3 3 1 3 3 3 2 3 3 1 2 2 3 3 3 3 2 3 2 2 2 2 3 2 2 3 3 3 3 2 3 3
## [1777] 2 1 2 2 2 2 2 2 2 2 3 3 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 1 3 3 3 2 3
## [1814] 3 2 2 2 2 2 2 2 3 3 3 2 3 2 3 2 2 2 2 2 2 2 2 2 3 2 2 3 3 2 2 3 3 3 3 2 2
## [1851] 2 2 3 3 3 3 3 3 3 3 2 3 2 3 2 2 2 3 3 2 2 2 2 2 2 2 3 3 3 3 3 2 2 2 2 2 2
## [1888] 2 2 2 3 2 3 2 3 2 3 2 3 3 3 3 3 2 2 2 2 3 3 3 3 3 2 2 2 3 2 3 3 3 2 3 2 3
## [1925] 2 2 3 2 3 3 3 3 2 3 3 3 2 3 3 2 3 2 3 3 3 3 2 2 2 3 3 3 3 2 2 2 3 3 3 2 3
## [1962] 3 2 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 2 2 3 3 3 3 2 3 2 2 3 3 3 2 2 3 3 2
## [1999] 3 3 2 3 2 3 2 3 3 3 3 2 2 3 3 2 2 3 2 3 3 3 2 2 2 2 2 2 2 3 2 2 3 2 1 3 3
## [2036] 3 3 2 3 3 3 3 2 2 3 3 2 2 3 2 3 3 3 3 2 3 2 3 2 2 2 2 3 2 2 2 3 2 2 2 2 3
## [2073] 3 3 2 3 3 3 3 2 3 2 2 2 3 3 3 3 2 3 3 2 3 3 3 2 3 3 2 2 2 3 3 2 2 1 3 3 3
## [2110] 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 3 2 2 2 3 2 2 3 2
## [2147] 3 3 2 2 3 3 2 3 3 2 3 2 3 2 3 2 3 2 2 3 2 2 2 2 3 2 3 3 3 3 3 3 3 2 3 2 2
## [2184] 3 2 2 3 3 2 2 3 2 3 3 3 2 3 3 3 2 3 3 3 3 3 2 2 2 1 3 3 2 2 2 2 3 3 3 3 2
## [2221] 2 3 3 3 2 2 2 2 3 2 2 3 2 3 3 3 2 2 2 3 2 2 2 2 2 3 2 2 2 2 2 2 2 3 3 3 3
## [2258] 3 2 3 3 2 3 2 3 3 2 3 2 2 3 3 3 2 2 2 3 3 3 3 3 2 2 3 2 3 1 2 3 3 2 2 2 2
## [2295] 2 3 2 2 2 2 3 3 3 3 3 2 3 2 3 2 3 3 2 3 3 2 2 3 3 2 3 3 3 3 3 3 2 3 1 2 2
## [2332] 3 2 2 3 2 2 3 3 3 3 3 2 2 3 2 3 2 3 3 2 2 2 3 3 2 2 3 3 2 2 2 2 2 3 2 3 2
## [2369] 2 3 3 2 2 3 3 3 2 2 2 3 2 2 2 2 2 2 2 3 2 2 3 2 3 2 2 2 2 3 3 2 2 2 2 3 2
## [2406] 2 2 2 2 2 3 3 2 2 3 3 2 3 2 3 2 2 3 3 2 2 3 3 3 1 3 3 3 1 3 3 3 3 3 3 2 2
## [2443] 3 3 3 2 3 2 3 3 2 3 2 3 3 2 2 2 2 3 2 3 3 3 3 3 2 2 3 2 2 3 3 3 3 3 3 3 3
## [2480] 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 2 3 3 2 3 2 2 2 3 3 1 3 2 2 3 3 3 3 1 3 3
## [2517] 3 3 3 2 2 2 2 3 3 3 3 2 2 3 2 2 2 2 2 3 2 2 2 2 2 3 3 2 3 2 3 1 3 3 2 3 3
## [2554] 2 3 3 3 3 2 2 3 2 3 2 3 3 2 3 3 3 3 2 3 3 2 3 2 3 3 3 3 3 3 3 3 2 2 3 3 3
## [2591] 1 2 3 3 3 3 2 2 3 3 2 2 3 3 3 3 3 3 3 3 3 3 2 2 3 2 3 3 2 3 3 3 2 3 3 3 1
## [2628] 3 2 3 2 2 2 2 3 1 1 3 3 1 3 3 2 3 3 3 3 3 3 2 2 3 1 3 3 3 2 3 2 3 2 2 2 3
## [2665] 2 2 3 3 3 3 2 3 2 2 3 2 3 2 2 3 2 2 2 3 2 3 3 2 3 2 2 3 3 2 3 2 3 2 3 2 3
## [2702] 3 3 2 3 3 3 3 2 3 3 2 3 1 3 3 2 3 2 2 3 3 3 3 2 3 3 3 3 3 3 2 3 3 2 3 3 2
## [2739] 3 3 2 2 3 3 2 3 3 3 2 2 2 1 3 2 3 2 2 2 2 3 2 3 2 3 3 3 3 3 3 3 2 3 3 2 2
## [2776] 2 3 2 2 2 2 3 3 3 2 3 3 3 3 3 3 2 2 2 2 3 2 2 3 3 3 2 3 3 2 2 2 3 3 3 2 3
## [2813] 3 3 2 3 2 3 3 3 3 3 2 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 2 2 3 3 3 3 2 3 3 3 2
## [2850] 3 3 3 3 2 3 3 2 2 2 2 3 3 2 3 3 3 2 2 2 2 3 2 3 2 3 2 2 3 3 3 3 2 3 3 3 3
## [2887] 3 3 3 3 2 3 3 3 2 3 3 3 3 2 2 2 2 2 3 3 2 3 3 2 3 3 2 2 2 3 3 3 2 3 3 3 3
## [2924] 3 3 2 3 3 3 3 2 2 3 3 3 3 2 2 2 3 3 2 2 3 3 3 3 3 3 3 3 3 2 2 3 2 2 3 3 3
## [2961] 3 3 3 3 3 3 2 2 2 3 3 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 2
## [2998] 2 3 2 2 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 2 3 3 2 3 3 2 3 3 3 3 3 3 3 3 3 3 3
## [3035] 2 2 3 2 2 2 3 3 3 2 3 3 3 3 3 2 3 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2
## [3072] 3 3 3 3 2 3 3 3 3 3 3 3 2 3 3 2 2 2 3 3 2 3 3 3 1 3 3 3 3 3 3 3 3 3 2 2 2
## [3109] 3 3 3 3 2 3 3 2 2 3 3 3 3 3 2 2 2 3 2 3 2 2 3 2 3 2 3 3 3 3 3 2 3 3 3 3 3
## [3146] 2 2 3 3 2 2 3 3 2 3 2 2 2 3 3 3 3 3 3 3 2 3 2 2 2 2 2 3 2 2 3 1 2 2 2 3 3
## [3183] 2 2 2 2 2 2 3 3 2 3 3 2 2 3 2 3 3 2 2 3 3 3 3 3 3 2 2 3 3 3 3 2 3 2 3 2 3
## [3220] 3 2 2 3 3 2 2 2 3 2 3 3 3 3 2 3 2 3 2 3 2 2 2 2 2 2 2 3 3 2 3 2 3 2 2 3 3
## [3257] 2 2 2 2 2 3 2 2 3 3 3 3 3 2 3 2 2 2 2 3 3 3 3 2 2 2 2 2 2 2 2 2 3 2 2 3 2
## [3294] 2 3 2 3 3 3 3 2 2 2 2 3 3 2 1 2 3 3 3 3 2 3 3 3 2 3 3 3 2 3 3 2 3 2 2 3 3
## [3331] 2 3 3 2 3 3 2 3 3 2 3 2 2 2 2 3 2 3 3 2 3 2 3 3 2 2 2 3 2 2 3 3 2 2 2 2 2
## [3368] 3 3 2 3 2 2 3 2 2 3 3 3 2 3 2 1 3 2 3 3 3 3 2 3 3 2 3 2 3 3 2 2 2 3 2 2 2
## [3405] 2 3 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 3 2 2 2 2 2 2 2 2 3 2 2 3 2 2 3 3
## [3442] 2 2 3 2 2 2 2 3 3 3 3 3 3 2 1 3 3 2 3 2 3 2 3 1 2 2 3 2 3 3 2 2 2 2 3 3 2
## [3479] 2 2 2 2 2 2 2 3 2 2 3 2 2 2 2 2 2 2 2 2 3 2 3 2 3 2 2 3 3 3 2 2 3 3 3 3 3
## [3516] 2 3 2 2 2 2 2 3 3 3 3 3 3 2 2 2 1 2 3 2 2 3 3 2 2 2 2 2 2 2 2 2 3 2 2 1 3
## [3553] 2 2 2 3 2 2 3 3 2 3 2 2 2 3 2 3 3 3 3 2 2 2 2 2 2 3 2 2 2 2 2 2 2 3 2 2 3
## [3590] 3 2 3 3 2 2 2 2 2 2 3 3 3 3 3 2 2 3 3 3 3 3 3 2 3 3 3 3 3 2 3 3 2 3 2 2 1
## [3627] 2 2 2 3 2 3 2 3 3 3 2 3 3 2 3 3 3 2 3 2 2 2 2 2 2 2 3 3 3 2 3 2 2 2 3 3 3
## [3664] 2 3 3 3 3 3 3 3 2 2 2 3 3 3 3 2 3 2 2 2 3 3 3 2 3 3 2 2 3 2 2 2 2 2 2 3 2
## [3701] 3 2 3 3 2 2 2 2 2 2 2 2 3 2 2 3 3 3 3 2 3 2 2 3 2 2 2 3 3 3 2 2 3 3 2 3 2
## [3738] 3 2 2 2 2 3 2 2 3 2 2 3 3 2 3 3 2 2 3 3 3 3 3 3 1 3 2 3 3 2 2 2 2 2 2 2 2
## [3775] 2 2 3 2 3 2 3 2 2 2 2 1 3 2 2 3 2 2 2 2 3 3 3 2 2 2 3 2 3 3 2 2 3 3 3 3 3
## [3812] 3 3 3 2 3 3 3 3 3 2 3 2 3 2 2 2 2 3 2 3 3 3 3 2 2 2 3 2 2 2 2 2 2 3 2 3 3
## [3849] 2 2 3 2 2 3 3 3 2 2 2 2 3 3 2 2 2 2 2 2 2 3 2 3 3 3 2 2 3 2 2 3 3 2 2 2 2
## [3886] 2 2 2 3 3 3 3 3 3 2 3 2 3 3 3 2 2 2 3 3 3 2 2 3 2 3 2 3 3 2 3 2 2 3 3 3 3
## [3923] 3 2 3 2 3 2 2 2 3 2 3 2 2 2 2 2 3 3 2 3 3 3 2 2 3 3 2 2 2 3 3 3 3 3 3 2 2
## [3960] 2 3 3 2 2 3 2 2 2 3 3 2 3 3 3 3 3 3 2 2 2 3 3 3 3 2 3 3 3 3 3 2 3 3 2 2 2
## [3997] 3 3 3 3 3 2 3 2 2 3 3 2 3 2 2 2 3 2 3 2 2 3 2 3 2 2 3 2 3 3 2 3 2 2 2 2 2
## [4034] 2 2 2 3 2 2 2 3 2 2 2 2 2 2 3 3 2 2 3 3 2 2 2 2 2 2 2 3 3 2 3 2 2 3 3 3 2
## [4071] 3 3 2 3 1 2 3 2 2 3 2 2 2 2 2 2 3 3 2 3 2 2 2 2 3 2 2 2 3 2 2 3 3 2 2 2 3
## [4108] 2 2 2 2 3 3 3 3 3 2 3 2 2 3 3 3 2 3 3 3 3 2 3 2 2 2 3 3 2 2 2 3 2 3 2 2 3
## [4145] 3 2 3 2 2 3 2 2 3 3 3 2 3 3 3 3 2 3 3 3 2 3 3 2 3 3 3 3 2 3 2 2 2 3 2 2 2
## [4182] 2 2 2 2 2 3 2 1 2 3 3 2 3 3 2 3 2 2 3 2 3 3 3 3 3 2 2 3 2 2 3 2 3 2 2 2 3
## [4219] 2 2 2 3 2 3 2 3 3 2 3 3 3 2 2 2 3 2 2 3 3 3 2 3 3 2 3 3 3 2 2 2 3 3 2 2 2
## [4256] 2 3 2 3 2 3 3 3 3 2 3 3 1 2 3 3 2 3 3 3 3 3 3 2 3 2 3 3 3 3 3 2 2 2 3 3 3
## [4293] 3 3 3 2 3 3 3 3 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 3 3 3 2 3 3 3 2 3 3 2 3
## [4330] 2 2 3 3 2 2 3 3 2 3 2 2 2 3 3 3 3 2 3 2 3 3 3 3 3 2 2 3 3 3 3 2 3 3 2 3 3
## [4367] 2 3 3 2 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 3 2 2 2 2 2 2 3 2 3 3 2 3 3 2 2 3 3
## [4404] 3 3 2 2 2 3 3 3 3 3 3 3 3 3 2 3 2 2 2 3 2 3 2 3 3 2 2 2 3 3 3 3 2 2 3 3 3
## [4441] 3 3 3 3 3 3 3 3 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2
## [4478] 3 3 3 3 2 3 3 3 3 3 3 3 3 3 2 3 3 3 2 2 3 2 3 3 2 3 3 3 2 3 3 3 2 2 3 2 3
## [4515] 3 3 3 3 2 2 3 3 3 2 2 3 3 3 2 3 3 2 3 3 3 3 2 3 3 3 3 2 3 3 3 3 3 2 2 3 3
## [4552] 3 3 3 3 3 3 3 3 2 3 2 3 3 3 2 2 3 3 3 3 3 2 2 3 3 2 2 3 3 3 2 3 3 3 3 3 3
## [4589] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 3 3 2 2 3 2 3 3 3 3 3 3 2 2 3
## [4626] 3 2 3 3 3 2 3 2 2 3 3 2 3 2 2 2 2 2 2 3 3 3 3 2 2 2 2 3 3 3 3 3 2 3 2 2 3
## [4663] 2 2 2 2 3 3 3 3 3 2 3 2 3 2 2 3 3 3 2 3 3 3 3 3 2 3 3 3 2 3 3 3 3 3 3 3 3
## [4700] 3 3 3 3 3 3 3 3 2 2 3 3 3 2 3 3 3 3 3 3 2 3 3 3 2 2 3 3 3 3 3 2 3 3 3 2 3
## [4737] 3 3 2 2 3 3 3 2 2 2 2 2 2 3 2 3 3 3 3 2 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3
## [4774] 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 2 2
## [4811] 2 3 2 3 3 3 2 3 3 3 3 3 2 3 3 3 2 2 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [4848] 3 3 3 2 3 3 3 2 2 2 2 2 2 3 2 3 2 2 3 3 3 2 3 3 3 3 3 3 3 3 2 3 3 3 3 2 3
## [4885] 3 3 3 2 2 3 3 2 3 3 2 2 2 3 3 3 3 3 3 3 3 3 2 2 3 3 3 3 3 2 3 3 3 3 3 2 3
## [4922] 3 3 3 3 3 3 3 3 2 2 3 3 3 2 2 2 3 3 3 3 3 2 2 2 2 3 3 2 3 3 3 3 3 3 2 3 3
## [4959] 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 2 3 3 3 3 2 3 3 2 3 3 3 3 3 3 2 2
## [4996] 2 3 2 3 3 3 3 3 3 3 3 2 2 3 3 3 2 2 3 3 2 3 2 3 2 3 3 2 3 2 2 2 3 2 2 2 3
## [5033] 3 3 3 3 2 2 2 3 3 3 3 2 3 2 2 3 3 3 3 2 3 3 3 3 3 3 2 2 3 3 3 2 3 3 2 3 2
## [5070] 3 2 3 3 3 2 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 2 3 3 3 3
## [5107] 3 2 2 2 3 3 3 3 3 3 3 3 3 2 3 2 2 3 3 3 2 3 3 2 2 3 2 2 2 3 3 3 3 3 3 2 3
## [5144] 3 2 2 2 3 3 3 2 2 2 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 2 3 3 3 3
## [5181] 3 3 3 3 3 3 2 2 2 3 3 3 3 3 3 3 2 3 2 2 3 3 3 3 2 2 3 2 2 2 3 3 2 2 3 2 3
## [5218] 2 2 2 2 3 2 3 2 3 2 2 2 3 3 2 3 3 3 3 3 3 3 3 2 2 3 3 3 3 3 3 3 3 2 3 2 3
## [5255] 3 3 2 3 3 3 3 3 3 3 3 2 3 3 3 2 3 3 3 3 3 3 3 2 3 2 3 3 2 3 2 2 2 3 3 3 3
## [5292] 3 3 2 3 3 2 3 2 3 2 2 2 2 2 3 3 2 3 3 2 2 2 3 2 3 3 2 2 3 3 3 3 3 3 3 3 3
## [5329] 3 2 2 3 3 2 3 3 2 3 2 2 2 2 2 2 2 3 2 2 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2
## [5366] 2 2 2 2 3 2 2 3 2 3 3 3 3 3 2 3 3 2 2 3 2 2 2 2 2 2 2 2 3 2 2 2 3 3 3 3 3
## [5403] 3 3 3 2 3 3 3 2 3 3 3 2 3 2 3 3 3 3 2 3 2 3 3 3 3 3 3 3 3 3 2 2 2 3 3 2 2
## [5440] 3 2 3 3 3 2 3 2 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 3 3 3 2 2 2 2 2 2 2 2 3 2
## [5477] 3 2 3 2 2 3 3 2 3 3 3 2 2 3 3 2 3 3 3 2 2 3 2 3 3 3 3 3 3 3 3 3 3 3 2 3 3
## [5514] 3 3 3 2 3 3 3 2 3 3 3 3 2 3 2 2 3 3 3 3 3 2 3 2 3 3 2 3 3 3 3 3 3 3 3 2 3
## [5551] 2 3 3 2 3 3 3 3 3 2 2 3 3 2 3 3 2 2 3 2 3 3 2 2 3 3 3 3 3 2 3 3 3 3 2 2 2
## [5588] 3 2 2 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 2 3 3 3 3 2 2 3 2 3 2 3 2 3 2 2 3
## [5625] 3 2 3 3 3 3 3 2 3 2 3 3 3 3 3 2 2 2 2 2 2 2 2 2 3 2 3 2 2 2 3 2 3 2 3 3 3
## [5662] 3 3 3 3 3 2 3 2 2 3 2 3 2 2 2 3 3 3 2 2 3 3 3 3 3 3 3 3 2 3 3 2 3 2 3 3 3
## [5699] 3 2 2 2 3 3 3 3 2 3 3 3 3 3 3 3 3 2 2 3 3 2 2 2 3 3 2 2 2 2 3 3 2 2 3 3 3
## [5736] 3 2 3 3 2 2 3 2 2 2 2 3 2 3 2 3 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 2 3 3
## [5773] 2 2 3 2 3 3 2 2 3 2 3 3 2 3 3 3 3 3 3 3 3 3 3 3 2 2 3 3 3 3 3 2 3 2 3 3 3
## [5810] 3 2 2 3 2 2 2 3 3 2 3 3 3 3 3 2 2 3 2 3 3 3 3 3 3 2 2 3 2 3 2 3 3 3 2 2 3
## [5847] 3 3 3 2 3 2 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 3 2 3 2 2 2 2 3
## [5884] 3 3 3 3 3 2 2 2 3 2 3 2 3 3 3 2 2 2 2 3 3 3 2 3 3 3 2 3 3 3 3 3 2 3 3 3 2
## [5921] 2 3 3 3 2 2 2 3 2 2 2 2 2 2 2 2 2 2 3 2 2 3 3 2 3 2 3 2 2 3 3 3 3 3 3 3 3
## [5958] 2 2 3 3 3 2 3 2 3 2 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 2 2 3 3 2 3 2 2 2 2 2
## [5995] 2 2 2 2 2 2 3 3 2 2 2 3 3 2 3 2 3 3 3 3 2 2 3 2 2 2 2 3 3 2 3 3 2 3 3 2 3
## [6032] 3 2 3 3 3 3 3 3 2 3 2 3 3 3 3 3 3 2 2 2 3 3 2 2 2 2 3 3 2 3 3 3 2 2 3 2 3
## [6069] 3 3 3 3 1 2 3 3 2 2 3 2 2 3 3 3 3 3 3 3 3 3 3 3 2 3 2 3 2 3 3 3 2 3 3 3 2
## [6106] 3 3 3 3 3 3 3 3 2 3 3 3 3 3 2 3 3 2 2 3 2 3 3 2 2 2 3 2 3 2 2 3 3 3 3 3 3
## [6143] 3 3 3 3 3 2 3 3 3 3 3 3 2 3 2 3 3 3 3 3 3 3 2 2 2 3 3 3 3 3 3 3 3 3 2 3 2
## [6180] 2 3 3 2 3 3 3 3 3 3 3 2 3 2 3 3 3 3 2 3 3 3 2 3 3 3 3 2 3 3 3 3 2 2 2 3 3
## [6217] 3 3 2 3 3 3 3 3 3 2 3 3 3 2 3 2 3 3 3 3 3 3 2 3 3 3 2 3 3 3 3 2 3 3 3 2 3
## [6254] 2 2 2 2 3 3 3 3 3 3 3 3 2 3 3 3 2 3 3 3 3 3 2 3 2 3 3 3 3 3 3 2 3 2 2 2 2
## [6291] 2 3 2 2 3 3 3 2 2 2 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 3 2 3 2
## [6328] 2 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 2 2 3 2 2 3 2 3 3 2 3 3 3 3 3 3 2 3 3 3
## [6365] 3 3 3 2 2 2 2 2 3 3 3 3 3 2 3 3 3 2 2 2 3 3 3 2 3 3 2 3 2 2 3 3 3 2 3 3 3
## [6402] 3 3 3 3 2 3 3 3 2 3 3 2 3 3 2 3 3 3 2 3 3 3 2 3 3 3 2 3 3 3 3 3 3 3 2 3 3
## [6439] 3 3 3 3 3 3 3 3 3 3 2 3 2 3 3 3 2 2 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3
## [6476] 3 3 3 2 2 2 3 3 2 2 3 3 3 2 3 3 3 3 2 3 3 3
##
## Within cluster sum of squares by cluster:
## [1] 18604.50 14887.52 21116.02
## (between_SS / total_SS = 35.3 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
clustered <- num %>% mutate(cluster = factor(km$cluster))
cluster_sizes <- clustered %>% count(cluster, name = "n")
cluster_sizes %>% gt() %>% tab_header(title = md("**Cluster Sizes**"))
| Cluster Sizes | |
| cluster | n |
|---|---|
| 1 | 1604 |
| 2 | 1962 |
| 3 | 2931 |
# Standardized centroids (from kmeans)
centers_z <- as_tibble(km$centers, .name_repair = "minimal") %>%
mutate(cluster = factor(1:n())) %>%
relocate(cluster)
# Convert standardized centers back to original scale:
num_means <- sapply(num, mean)
num_sds <- sapply(num, sd)
centers_orig <- centers_z
for (v in names(num)) {
centers_orig[[v]] <- centers_z[[v]] * num_sds[[v]] + num_means[[v]]
}
centers_z %>% gt() %>% tab_header(title = md("**Cluster Centroids (Z-Scores)**"))
| Cluster Centroids (Z-Scores) | |||||||||||||
| cluster | x1 | fixed_acidity | volatile_acidity | citric_acid | residual_sugar | chlorides | free_sulfur_dioxide | total_sulfur_dioxide | density | p_h | sulphates | alcohol | quality |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | -1.2794081 | 0.8698558 | 1.1673900 | -0.330034429 | -0.6134625 | 0.91468449 | -0.83830632 | -1.21545351 | 0.6967087 | 0.55452951 | 0.8402120 | -0.1063270 | -0.2560702 |
| 2 | 0.3309097 | -0.1795612 | -0.3394396 | 0.266846681 | 1.1358154 | -0.08550749 | 0.80852466 | 0.94887336 | 0.7304700 | -0.38111630 | -0.2602038 | -0.8045438 | -0.3015520 |
| 3 | 0.4786509 | -0.3558340 | -0.4116387 | 0.001986365 | -0.4245909 | -0.44332590 | -0.08245719 | 0.02998905 | -0.8702500 | -0.04835044 | -0.2856295 | 0.5967463 | 0.3419930 |
centers_orig %>% gt() %>% tab_header(title = md("**Cluster Centroids (Original Units)**"))
| Cluster Centroids (Original Units) | |||||||||||||
| cluster | x1 | fixed_acidity | volatile_acidity | citric_acid | residual_sugar | chlorides | free_sulfur_dioxide | total_sulfur_dioxide | density | p_h | sulphates | alcohol | quality |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 849.2569 | 8.343017 | 0.5318610 | 0.2706733 | 2.524501 | 0.08807855 | 15.64589 | 47.04489 | 0.9967858 | 3.307662 | 0.6562968 | 10.364983 | 5.594763 |
| 2 | 3869.6764 | 6.982518 | 0.2837819 | 0.3574108 | 10.847222 | 0.05303823 | 44.87615 | 169.37666 | 0.9968871 | 3.157222 | 0.4925484 | 9.532212 | 5.555046 |
| 3 | 4146.7895 | 6.753992 | 0.2718953 | 0.3189219 | 3.423115 | 0.04050256 | 29.06175 | 117.43961 | 0.9920870 | 3.210727 | 0.4887649 | 11.203547 | 6.117025 |
centers_long <- centers_z %>%
pivot_longer(-cluster, names_to = "feature", values_to = "zscore")
ggplot(centers_long, aes(feature, cluster, fill = zscore)) +
geom_tile() +
scale_fill_gradient2(low = "steelblue", mid = "white", high = "firebrick", midpoint = 0) +
labs(title = "Cluster Profiles (Standardized Centers)",
x = "Feature", y = "Cluster") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
pca <- prcomp(num_scaled, center = FALSE, scale. = FALSE)
pca_df <- as_tibble(pca$x[, 1:2]) %>%
mutate(cluster = factor(km$cluster))
ggplot(pca_df, aes(PC1, PC2, color = cluster)) +
geom_point(alpha = 0.8, size = 2.5) +
stat_ellipse(type = "t", linetype = 2) +
labs(title = "PCA Projection Colored by Cluster",
subtitle = "Ellipses show approximate cluster spread",
color = "Cluster")
d <- dist(num_scaled)
sil <- silhouette(km$cluster, d)
factoextra::fviz_silhouette(sil) + ggtitle("Silhouette Plot")
## cluster size ave.sil.width
## 1 1 1604 0.22
## 2 2 1962 0.22
## 3 3 2931 0.22
The silhouette averages (~0.22) indicate modest separation: clusters are somewhat diffuse and overlapping. This suggests the wine chemistry data forms a continuum rather than distinct groupings. For managerial use, these clusters should be interpreted as soft style groupings (‘lighter’, ‘balanced’, ‘richer’) rather than hard categorical classes. # Interpretation — Personas & Business Takeaways
# Rank features by relative importance per cluster (absolute z-score)
top_features <- centers_long %>%
group_by(cluster) %>%
slice_max(order_by = abs(zscore), n = 5, with_ties = FALSE) %>%
arrange(cluster, desc(abs(zscore)))
top_features %>%
group_by(cluster) %>%
summarise(top5 = paste0(feature, " (", sprintf("%.2f", zscore), "z)", collapse = ", ")) %>%
gt() %>%
tab_header(title = md("**Top-Defining Features (by Cluster)**")) %>%
cols_label(top5 = "Most Distinctive Signals")
| Top-Defining Features (by Cluster) | |
| cluster | Most Distinctive Signals |
|---|---|
| 1 | x1 (-1.28z), total_sulfur_dioxide (-1.22z), volatile_acidity (1.17z), chlorides (0.91z), fixed_acidity (0.87z) |
| 2 | residual_sugar (1.14z), total_sulfur_dioxide (0.95z), free_sulfur_dioxide (0.81z), alcohol (-0.80z), density (0.73z) |
| 3 | density (-0.87z), alcohol (0.60z), x1 (0.48z), chlorides (-0.44z), residual_sugar (-0.42z) |
How to read this. Positive z-scores = above-average for that feature; negative = below-average. Use this to name clusters (“High-Body & Phenolics”, “Light-Body & Aromatic”, etc.).
Example guidance: - Cluster 1 — “Rich & Structured”: High on phenolic-related features; tends to be fuller-bodied. Implication: Position as a premium, age-worthy style; expect higher willingness-to-pay in certain markets.
- Cluster 2 — “Bright & Aromatic”: Higher on acidity/aromatic markers; lighter profile. Implication: Emphasize freshness/food-pairing; good for by-the-glass programs.
- Cluster 3 — “Balanced & Approachable”: Close to average across many features. Implication: Volume play; accessible entry SKU.
Tie these to audience goals (e.g., SKU rationalization, pricing ladders, portfolio storytelling).
k
(e.g., 2 or 4) and confirm that business narratives remain stable.
Re-run k-means with different nstart to check
stability.list(
k_chosen = k_best,
avg_silhouette = mean(sil[, "sil_width"]),
tot_withinss = km$tot.withinss,
betweenss = km$betweenss
)
## $k_chosen
## [1] 3
##
## $avg_silhouette
## [1] 0.2179186
##
## $tot_withinss
## [1] 54608.04
##
## $betweenss
## [1] 29839.96
What we learned. Wines naturally segment into k = 3 groups with distinct feature profiles. These profiles can be named and actioned for portfolio strategy, pricing tiers, and storytelling.
How this helps our audience. Instead of raw chemistry variables, we deliver clear personas that inform go-to-market choices. Managers can: (1) prioritize SKUs that anchor each persona, (2) tailor messaging to the profile, and (3) align pricing with perceived value.