From Chemical Composition to Perceived Quality, Dual-Space MDS and Preference Mapping of Wines
# library
library(tidyverse)
library(knitr)
library(kableExtra)
library(clusterSim)
library(smacof)
library(StatMatch)
library(corrplot)
library(pheatmap)
library(factoextra)
library(ggrepel)1 Introduction
The objective of this project is to analyze how does the perceptual space of wine quality differ when it is constructed from objective chemical composition versus when subjective quality information is incorporated into the dissimilarity structure by comparing two methods metric and non-metric multidimensional scaling (MDS).
The analysis is based on the red wine quality dataset introduced by (Cortez et al. (2009)), which contains physicochemical measurements of wines together with expert quality ratings. Rather than treating quality as a prediction target, this project focuses on understanding how wines are organized perceptually and how quality relates to similarity patterns within that space.
The dataset is inherently high-dimensional, consisting of multiple correlated chemical attributes such as acidity measures, sulphur compounds, alcohol content, and density. In such settings, conventional exploratory tools and direct inspection of pairwise relationships often fail to reveal clear and interpretable structures. While linear dimensionality reduction techniques such as principal component analysis (PCA) are useful for variance compression, they rely on linear assumptions and do not explicitly model perceptual similarity. As a result, they may be insufficient for capturing how complex products like wine are cognitively differentiated.
Multidimensional scaling provides an alternative exploratory framework by focusing on dissimilarity relationships rather than raw variables. By constructing distance matrices and embedding them into a low-dimensional space, MDS allows perceptual structure to emerge without imposing strong parametric assumptions. In this project, MDS is used to construct wine perceptual spaces under different similarity definitions, enabling a comparison between purely chemical similarity and perceptually enriched similarity that incorporates wine quality information. Through the analysis of distance structures, stress diagnostics, and preference patterns, the project examines how objective composition aligns with—and diverges from—perceived wine quality.
2 Key References Informing the Project
The analysis follows several key ideas from the literature that directly guide the methodological choices of this project:
- Non-metric MDS as a tool for perceptual representation. The theoretical foundation of non-metric multidimensional scaling establishes that ordinal similarity information can be faithfully represented in low-dimensional perceptual spaces without assuming linear relationships or metric proportionality (Kruskal, 1964). This principle motivates the use of non-metric MDS to model wine perception.1
- Perceptual mapping of wines using MDS. Applications of MDS in wine research demonstrate how perceptual spaces can be interpreted through post-hoc associations with chemical and sensory attributes, rather than direct model coefficients (Johnson et al., 2013). This approach informs the interpretation strategy adopted in this project.2
- Limits of physicochemical variables in explaining quality. The wine quality dataset introduced by Cortez et al. (2009) shows that chemical variables explain quality only partially, highlighting the need for exploratory and perceptual analysis rather than purely predictive modelling.3
- Flexible similarity definitions for mixed and subjective data. Gower’s similarity coefficient enables the integration of quantitative measurements and subjective evaluations within a unified distance framework (Gower, 1971), supporting the construction of perceptually enriched dissimilarity matrices.4
- Preference mapping and ideal-point interpretation. Preference mapping literature emphasizes the interpretation of ideal points and perceptual inconsistencies rather than direct prediction of liking, providing the conceptual basis for the ideal-point analysis conducted in this project (Sánchez et al., 2019).5
3 Distance construction
3.1 Introduce data and exploratory descriptive statistics
Based on the paper on Australian Shiraz, before entering the MDS, the dataset is first basic organized to represent the mean and standard deviation (SD) of various sensory attributes or physicochemical indicators under different regions/styles/clusters. This indicates whether there are systematic differences among these wines at the level of original variables before the “perceptual space” is generated.
wine <- read.csv("winequality-red.csv", sep = ",")
wine$quality_group <- case_when(
wine$quality <= 5 ~ "Low",
wine$quality == 6 ~ "Medium",
wine$quality >= 7 ~ "High"
)descriptive_table <- wine%>%
group_by(quality_group) %>%
summarise(across(
.cols = where(is.numeric),
.fns = list(mean = mean, sd = sd),
.names = "{.col}_{.fn}"
))
descriptive_table %>%
pivot_longer(cols = -quality_group, names_to = "metric", values_to = "value") %>%
separate(metric, into = c("parameter", "stat"), sep = "_") %>%
pivot_wider(names_from = c(quality_group, stat), values_from = value) %>%
kable(caption = "Physicochemical indicators and quality group comparison", digits = 2) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
add_header_above(c(" " = 1, "High Quality" = 2, "Low Quality" = 2, "Medium Quality" = 2))| parameter | High_mean | High_sd | Low_mean | Low_sd | Medium_mean | Medium_sd |
|---|---|---|---|---|---|---|
| fixed.acidity | 8.85 | 2.00 | 8.14 | 1.57 | 8.35 | 1.80 |
| volatile.acidity | 0.41 | 0.14 | 0.59 | 0.18 | 0.50 | 0.16 |
| citric.acid | 0.38 | 0.19 | 0.24 | 0.18 | 0.27 | 0.20 |
| residual.sugar | 2.71 | 1.36 | 2.54 | 1.39 | 2.48 | 1.44 |
| chlorides | 0.08 | 0.03 | 0.09 | 0.06 | 0.08 | 0.04 |
| free.sulfur.dioxide | 13.98 | 10.23 | 16.57 | 10.89 | 15.71 | 9.94 |
| total.sulfur.dioxide | 34.89 | 32.57 | 54.65 | 36.72 | 40.87 | 25.04 |
| density | 1.00 | 0.00 | 1.00 | 0.00 | 1.00 | 0.00 |
| pH | 3.29 | 0.15 | 3.31 | 0.15 | 3.32 | 0.15 |
| sulphates | 0.74 | 0.13 | 0.62 | 0.18 | 0.68 | 0.16 |
| alcohol | 11.52 | 1.00 | 9.93 | 0.76 | 10.63 | 1.05 |
| quality | 7.08 | 0.28 | 4.90 | 0.34 | 6.00 | 0.00 |
1.The “Core Drivers” of Quality:
AlcoholandVolatile Acidity. These are the two most significantly different indicators in the table and are key to distinguishing between High and Low.Alcohol: Shows a clear positive correlation. The High group has the highest mean (\(11.52\)), followed by Medium (\(10.63\)), and the Low group has the lowest (\(9.93\)). This indicates that in the physicochemical space, alcohol concentration is the main “long axis” that widens the gap between samples.Volatile Acidity: Shows a clear negative correlation. The High group has the lowest mean (\(0.41\)), while the Low group has the highest (\(0.59\)). Volatile acids usually give a vinegar-like taste, which reasonably explains why these wines were rated low.- Structure and Stability: Acidity and Sulfate.
Fixed AcidityandCitric Acid: The High group has significantly higher acidity (\(8.85\) vs \(8.14\)). High-scoring red wines typically require a more robust acidity structure to support their palate.Sulfates: The High group showed significantly higher levels than other groups (\(0.74\)). Sulfates act as antioxidants and antimicrobials, helping to maintain the wine’s quality and freshness.
3.2 Data standarlized
Standardization ensures that the dissimilarity structure reflects relative differences in chemical composition rather than dominance by variables with larger numerical ranges.
3.3 Construction of two-distance space
3.3.1 Space A:Euclidean distance(chemical space)
This is an objective compositional space, which implies that the differences between wines are continuous, the differences are linear, and all variables are equally weighted.
3.3.2 Space B:Gower distance with quality embedded
Gower allows for mixed-type variables. In this space, quality acts as an anchoring variable in the perceptual dissimilarity structure, quality was up-weighted prior to Gower distance computation and embedding subjective evaluation into the dissimilarity definition. This transformation increases its influence on dissimilarity while preserving the ordinal structure required for non-metric MDS.
3.4 Heatmap
Distance heatmaps are used as a diagnostic tool prior to dimensionality reduction.
These heatmaps diagnose structural differences before MDS is
applied.Prior to dimensionality reduction, distance heatmaps provide
important diagnostic insights into the underlying similarity structure
of the data.
The Euclidean distance matrix, constructed solely from standardized chemical attributes, exhibits a largely continuous and homogeneous pattern. Distances vary smoothly across observations, and no clear block structure emerges. This suggests that differences in chemical composition among wines primarily follow gradual gradients rather than forming sharply separated groups. A small number of observations display consistently high distances to most others, indicating the presence of chemically extreme wines that may later influence stress and outlier diagnostics.
In contrast, the Gore distance matrix anchored to wine quality exhibits a significant block-diagonal structure. Intra-block distances are relatively small, while inter-block distances are significantly larger. This pattern suggests that, with the introduction of quality information, similarity relationships are reorganized into more structured perceptual partitions, a structure entirely derived from the definition of difference.
These observations already imply that different definitions of similarity encode fundamentally different structures in the same dataset.
4 MDS constructs a perception space
A central contribution of this analysis is the dual-space contrast between two perceptual representations of wine similarity.
This contrast allows us to evaluate whether patterns of objective chemical similarity align with perceived quality, or whether perceptual organization follows a distinct logic.
4.1 Metric MDS(Space A)
A chemical space, defined by Euclidean distances among standardized physicochemical attributes.
Prepare for baseline space, providing a physical benchmark for comparison with Space B. Chemical similarity alone may be insufficient to capture perceptual differences in wine quality.
4.2 Non-metric MDS(Space B)
A perceptual space, defined by Gower distances that embed subjective quality information.
This is the core space of the project, which doesn’t require proportional relationships between distances, but only retains sequential information, which is closer to “human judgment mechanisms”.
mds_space_B <- mds(
delta = distance_space_B,
ndim = 2,
type = "ordinal"
)
mds_coordinates_B <- as.data.frame(mds_space_B$conf)
colnames(mds_coordinates_B) <- c("MDS1", "MDS2")The comparison of stress values across dimensions reinforces this distinction. Metric MDS applied to the Euclidean space exhibits relatively high stress at low dimensionality, indicating that the chemical distance structure is difficult to faithfully embed in two dimensions. In contrast, non-metric MDS applied to the Gower space achieves substantially lower stress for the same dimensionality, suggesting that the ordinal structure of perceptual similarity is more compatible with a low-dimensional representation. Taken together, these results indicate that perceptually enriched similarity definitions yield more coherent and compressible structures than purely chemical ones.
Although MDS dimensions do not possess intrinsic loadings in the same sense as principal components, post-hoc correlations with original variables provide meaningful interpretive guidance.
In the Euclidean chemical space, the first MDS dimension shows strong associations with alcohol content and density, suggesting that it reflects a concentration-related gradient. Wines with higher alcohol levels tend to occupy one end of this dimension, while lower-alcohol wines cluster at the opposite end. The second dimension is more closely associated with acidity-related variables and sulphates, capturing a distinct chemical contrast orthogonal to overall concentration.
These interpretations are consistent with applied MDS practice in sensory and food science research, where dimensions are understood as emergent axes inferred from correlations rather than predefined constructs.
Importantly, such axis interpretation is intentionally restricted to the chemical space. In the perceptual Gower space, where subjective quality is embedded directly into the distance definition, interpreting dimensions in purely chemical terms would conflate objective and subjective information.
4.3 Stress and Scree plot
stress_values <- data.frame(
Dimensions = 1:6,
Stress_A = sapply(1:6, function(k)
mds(distance_space_A, ndim = k, type = "ratio")$stress),
Stress_B = sapply(1:6, function(k)
mds(distance_space_B, ndim = k, type = "ordinal")$stress)
)
ggplot(stress_values, aes(Dimensions)) +
geom_line(aes(y = Stress_A, color = "Space A")) +
geom_line(aes(y = Stress_B, color = "Space B")) +
labs(title = "Stress Scree Plot",
y = "Stress",
color = "Space")+
theme_minimal()
The stress scree plot provides quantitative support for the dual-space
contrast. Across all examined dimensionalities, the Gower-based
perceptual space exhibits systematically lower stress than the Euclidean
chemical space. This difference is particularly pronounced at low
dimensionality, indicating that perceptual similarity—when informed by
quality evaluation—contains a stronger low-dimensional structure. In
contrast, chemical similarity appears more diffuse and requires higher
dimensionality to achieve comparable fidelity. These findings suggest
that perceptual organization of wine quality is cognitively simpler and
more compressible than its underlying chemical composition.
4.4 Stress per point(spp)
spp_space_A <- mds_space_A$spp
spp_space_B <- mds_space_B$spp
outliers_A <- which(spp_space_A > mean(spp_space_A) + sd(spp_space_A))
outliers_B <- which(spp_space_B > mean(spp_space_B) + sd(spp_space_B))Stress-per-point diagnostics further illuminate differences between the two spaces. In the Euclidean MDS, several observations exhibit disproportionately high stress contributions, indicating wines whose chemical profiles are poorly represented in the low-dimensional configuration. These wines are likely characterized by extreme values in one or more chemical attributes.
In the Gower-based non-metric MDS, stress is more evenly distributed across observations, and fewer extreme outliers emerge. This suggests that perceptual similarity, as encoded through quality-informed distances, mitigates the destabilizing influence of chemically extreme cases.
From a perceptual standpoint, this finding implies that wines judged as similar in quality may tolerate greater chemical heterogeneity without disrupting the overall structure of the perceptual space.
4.5 Post-hoc interpretation of MDS axes (Space A only)
mds_coordinates_A <- as.data.frame(mds_space_A$conf)
colnames(mds_coordinates_A) <- c("MDS1", "MDS2")
axis_correlations <- apply(
chemical_scaled,
2,
function(variable)
c(
cor(variable, mds_coordinates_A$MDS1),
cor(variable, mds_coordinates_A$MDS2)
)
)
axis_correlations <- as.data.frame(t(axis_correlations))
colnames(axis_correlations) <- c("Corr_MDS1", "Corr_MDS2")These correlations do not define the MDS axes,but provide an a posteriori interpretation consistent with applied MDS literature.
5 Ideal point and preference mapping
To further explore the relationship between chemical composition and perceived quality, an empirical ideal point was constructed as the centroid of wines with high quality ratings (quality ≥ 7) in the chemical MDS space.
5.1 Preference Mapping and ideal point analysis
mds_coordinates_A$quality <- wine$quality
high_quality_index <- which(wine$quality >= 7)
ideal_point <- colMeans(
mds_coordinates_A[high_quality_index, c("MDS1", "MDS2")]
)
ideal_point## MDS1 MDS2
## 0.09501708 0.36620402
The empirical ideal point represents the average chemical location of highly rated wines in the MDS space, rather than a unique or optimal chemical profile, is located at MDS1 = 0.095 and MDS2 = 0.366. This position indicates that highly rated wines, on average, occupy a moderately shifted location along the second chemical dimension while remaining relatively central along the first.
Wines located close to this ideal point exhibit chemical profiles broadly consistent with those associated with high quality. However, a non-negligible number of wines lie far from the ideal point while still receiving high quality scores. Conversely, some wines positioned near the chemical ideal display only moderate or low quality ratings.
This pattern highlights a key perceptual inconsistency: chemical proximity does not guarantee perceptual preference, and perceptual excellence can arise even in chemically atypical wines. Such discrepancies suggest the influence of latent sensory attributes, contextual factors, or nonlinear interactions among chemical components that are not captured by the measured variables.
5.2 Visualization with ideal point overlay
ggplot(mds_coordinates_A, aes(x = MDS1, y = MDS2)) +
geom_point(aes(color = quality), alpha = 0.7) +
geom_point(
aes(x = ideal_point[1], y = ideal_point[2]),
color = "red",
size = 4
) +
geom_text_repel(
aes(x = ideal_point[1], y = ideal_point[2]),
label = "Ideal Point",
color = "red"
) +
labs(
title = "Metric MDS with Ideal Quality Point",
color = "Wine Quality"
) +
theme_minimal()
This figure shows the metric MDS configuration after up-weighting wine
quality in the dissimilarity construction. Compared to the unweighted
solution, the empirical ideal point shifts upward along the second MDS
dimension, indicating that perceived quality is more strongly aligned
with this chemical gradient. High-quality wines are more frequently
observed in regions with higher MDS2 values, suggesting a systematic
association between quality and the underlying attributes represented by
this dimension.
Importantly, high-quality wines remain widely distributed across the perceptual space rather than forming a compact cluster around the ideal point. This indicates that while quality weighting strengthens the perceptual signal along a dominant direction, it does not reduce wine quality perception to a single chemical optimum.
Based on the preference map, three distinct types of wines can be identified.
First, high-quality but chemically distant wines receive high quality ratings (quality ≥ 7) while being located far from the empirical ideal point in the chemical MDS space. Visually, these wines appear as dark-colored points situated toward the edges of the configuration, indicating perceptual success despite atypical chemical profiles.
Second, chemically typical but moderately rated wines are positioned close to the ideal point yet achieve only medium quality scores (typically 5–6). These wines match the average chemical characteristics associated with high quality but do not translate this proximity into strong perceptual evaluations, highlighting the limits of purely composition-based explanations.
Third, the map reveals clusters of high-quality wines in different regions of the chemical space, where highly rated wines form loose groupings across multiple, well-separated areas rather than around a single center. Together, these patterns indicate that perceived wine quality does not correspond to a unique chemical optimum but instead arises through multiple chemically distinct pathways, reinforcing the idea that perceptual excellence cannot be reduced to physicochemical similarity alone.
5.3 Dual-space contrast
By explicitly contrasting a purely chemical similarity space with a perceptually enriched space incorporating wine quality, this study provides empirical evidence that perceived wine quality cannot be fully inferred from physicochemical proximity alone. Distance heatmaps reveal that chemical similarity is largely continuous and weakly structured, whereas quality-informed similarity reorganizes wines into more coherent perceptual groupings.
This structural difference is confirmed quantitatively by the stress scree plot, where the Gower-based perceptual space consistently achieves lower stress than the Euclidean chemical space at comparable dimensionality, particularly in low-dimensional representations. These results indicate that perceptual similarity is more compressible and cognitively parsimonious than similarity defined solely by chemical composition.
Preference mapping further demonstrates that high-quality wines are not concentrated around a single chemical optimum. The empirical ideal point summarizes an average chemical position of highly rated wines, yet high-quality wines are widely distributed across the MDS space, including regions far from this centroid. The coexistence of chemically typical but moderately rated wines and chemically atypical yet highly rated wines highlights a systematic divergence between chemical proximity and perceived quality.
Together, these findings suggest that high perceived wine quality arises through multiple chemically distinct pathways and is influenced by latent sensory or perceptual factors not fully captured by standard physicochemical measurements. Methodologically, the results underscore the value of non-metric MDS and flexible distance definitions for studying perceptual organization in complex products, while substantively emphasizing the multidimensional and partially unobservable nature of wine quality perception.
6 References
Cortez, P., Cerdeira, A., Almeida, F., Matos, T., & Reis, J. (2009). Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 47(4), 547–553.↩︎
Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27(4), 857–871.↩︎
Johnson, T. E., Hasted, A., Ristic, R., & Bastian, S. E. P. (2013). Multidimensional scaling, cluster and descriptive analyses provide preliminary insights into Australian Shiraz wine regional characteristics. Food Quality and Preference, 29(2), 174–185.↩︎
Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a non-metric hypothesis. Psychometrika, 29(1), 1–27.↩︎
Sánchez, C. N., et al. (2019). Liking product landscape: Going deeper into preference mapping. Foods, 8(10), 461.↩︎