Introduction

High-dimensional sensory datasets are difficult to interpret due to complex interdependencies between variables. Dimension reduction techniques provide a structured way to uncover latent structure while preserving essential information.

This study compares Principal Component Analysis (PCA) and Multidimensional Scaling (MDS) using coffee sensory evaluation data. While PCA maximizes explained variance, MDS preserves pairwise distances. The goal is to examine whether these fundamentally different approaches produce distinct geometric representations.

Data Preparation

species_palette <- c("Arabica" = "#1f77b4", "Robusta" = "#d62728")
coffee <- read.csv("merged_data_cleaned.csv")

sensory_vars <- coffee %>%
  select(Aroma, Flavor, Aftertaste, Acidity, Body, Balance,
         Uniformity, Clean.Cup, Sweetness,
         Cupper.Points, Total.Cup.Points, Moisture)
species <- coffee$Species
sensory_scaled <- scale(sensory_vars)

Correlation Structure

cor_matrix <- cor(sensory_scaled)

corrplot(cor_matrix,
         method = "color",
         type = "upper",
         tl.cex = 0.7,
         tl.col = "black",
         order = "hclust")

The correlation matrix reveals a clear block structure among flavor-related attributes such as Flavor, Aftertaste, Aroma, Body, Balance, and Total Cup Points, all of which exhibit strong positive correlations. This clustering pattern suggests that these variables move together along a common quality gradient.

In contrast, Moisture shows weaker and in some cases slightly negative associations with the main sensory attributes, indicating that it behaves differently from the core quality-related variables. Overall, the strong positive interdependencies support the presence of a dominant latent quality dimension underlying the sensory data.

PCA (Principal Component Analysis)

The PCA findings indicate that the first principal component (PC1) captures 58.85% of the overall variance, suggesting a strong underlying latent structure in the sensory dataset. The second component (PC2) contributes 12.60% of the variance, whereas the third component accounts for 7.65%. Together, the first two components capture approximately 71.45% of the total variance, suggesting that a two-dimensional representation provides a substantial summary of the data structure. The cumulative variance exceeds 84% by the fourth component, indicating that most of the variability is concentrated in the first few dimensions.

Examining the loadings, most flavor-related variables such as Aroma (−0.318), Flavor (−0.349), Aftertaste (−0.345), Acidity (−0.322), and Total Cup Points (−0.369) show relatively large and similar contributions to PC1. This consistent pattern suggests that PC1 represents a global sensory quality dimension. In contrast, PC2 is more strongly associated with Sweetness (0.536), Clean Cup (0.474), and Uniformity (0.459), indicating that this component captures a secondary structure related to specific cleanliness and sweetness attributes. Moisture loads strongly on PC3 (−0.922), suggesting that it behaves differently from the core flavor attributes and forms a largely independent dimension.

pca_model <- prcomp(sensory_scaled, center = FALSE, scale. = FALSE)

summary(pca_model)
## Importance of components:
##                           PC1    PC2     PC3    PC4     PC5     PC6     PC7
## Standard deviation     2.6574 1.2299 0.95794 0.7676 0.69174 0.60750 0.55903
## Proportion of Variance 0.5885 0.1260 0.07647 0.0491 0.03988 0.03075 0.02604
## Cumulative Proportion  0.5885 0.7145 0.79099 0.8401 0.87997 0.91072 0.93676
##                            PC8     PC9    PC10    PC11     PC12
## Standard deviation     0.51504 0.46948 0.41925 0.31207 0.004182
## Proportion of Variance 0.02211 0.01837 0.01465 0.00812 0.000000
## Cumulative Proportion  0.95887 0.97724 0.99188 1.00000 1.000000
pca_model$rotation
##                          PC1        PC2          PC3          PC4         PC5
## Aroma            -0.31817882 -0.1319324 -0.110760024  0.016218215  0.02549520
## Flavor           -0.34874830 -0.1225097 -0.096349898  0.039612497  0.05911413
## Aftertaste       -0.34480415 -0.1431004 -0.055091052  0.052051398  0.04949966
## Acidity          -0.32201603 -0.1463529 -0.135952501 -0.066854710 -0.10307270
## Body             -0.30660900 -0.1558051 -0.102739921 -0.241671021 -0.07213774
## Balance          -0.33008061 -0.1237548 -0.002274809 -0.050061222 -0.01851892
## Uniformity       -0.21210784  0.4592216  0.148943787  0.285175301 -0.78330649
## Clean.Cup        -0.20162707  0.4744611  0.254095386  0.465121905  0.57417639
## Sweetness        -0.16270840  0.5356531  0.085069750 -0.767702415  0.13025996
## Cupper.Points    -0.31505314 -0.1484964 -0.020887098  0.163363899  0.11416728
## Total.Cup.Points -0.36866953  0.1591176  0.036137597  0.007521486  0.03349051
## Moisture          0.06555463  0.3403030 -0.922302752  0.127684133  0.05351153
##                           PC6         PC7           PC8         PC9
## Aroma             0.256613251  0.68764326 -0.4704940808 -0.04501265
## Flavor            0.178182284  0.07674974  0.0603175684  0.05034773
## Aftertaste        0.099795200 -0.04147535  0.0470794727  0.21950291
## Acidity           0.087740161  0.15743860  0.8180068678 -0.09654110
## Body             -0.715907563 -0.02698049 -0.1491580932 -0.50510032
## Balance          -0.325859695 -0.24538782 -0.1365644106  0.74021714
## Uniformity        0.011933051 -0.01472985 -0.0571025363 -0.03347511
## Clean.Cup        -0.231186622  0.11074897  0.1080918128 -0.06069239
## Sweetness         0.205440372 -0.06051772 -0.0259197695  0.01736866
## Cupper.Points     0.410656073 -0.64223661 -0.2145368931 -0.35741727
## Total.Cup.Points -0.001986957 -0.01184465 -0.0006012022 -0.01459578
## Moisture         -0.039966977 -0.07108534 -0.0332356624  0.04270291
##                          PC10         PC11          PC12
## Aroma             0.286242505  0.120096483 -9.860467e-02
## Flavor           -0.435516968 -0.777639078 -1.039661e-01
## Aftertaste       -0.645093903  0.602496567 -1.058879e-01
## Acidity           0.334949317  0.091559148 -9.897016e-02
## Body             -0.086705671  0.007501729 -9.670181e-02
## Balance           0.349540626 -0.079145487 -1.070684e-01
## Uniformity       -0.034197323 -0.007446221 -1.451963e-01
## Clean.Cup         0.050865794  0.012144603 -1.995920e-01
## Sweetness        -0.009950068  0.016112157 -1.607171e-01
## Cupper.Points     0.257285816  0.047587511 -1.232563e-01
## Total.Cup.Points  0.013267113  0.006455946  9.141690e-01
## Moisture          0.011344226  0.019494488  2.905536e-06
#Scree Plot
fviz_eig(pca_model, addlabels = TRUE) +
  coffee_theme +
  labs(
    title = "Scree Plot",
    x = "Principal Component",
    y = "Explained Variance (%)"
  )

#Loading Plot
fviz_pca_var(pca_model, repel = TRUE) +
  coffee_theme +
  labs(
    title = "PCA Variable Loadings",
    subtitle = "Directions indicate how sensory attributes contribute to components"
  )

#PCA Scores
pca_scores <- as.data.frame(pca_model$x)
pca_scores$Species <- species

ggplot(pca_scores, aes(PC1, PC2, color = Species, fill = Species)) +
  geom_point(alpha = 0.55, size = 2) +
  stat_ellipse(type = "norm", level = 0.95, geom = "polygon", alpha = 0.12, color = NA) +
  stat_ellipse(type = "norm", level = 0.95, linewidth = 0.7) +
  scale_color_manual(values = species_palette, na.value = "grey50") +
  scale_fill_manual(values = species_palette, na.value = "grey50") +
  coffee_theme +
  labs(
    title = "PCA Projection of Coffee Sensory Data",
    subtitle = "Ellipses show 95% concentration by Species",
    x = "PC1",
    y = "PC2"
  )

I observe that coffee sensory evaluations do not form strictly distinct groups but instead align along a continuous quality dimension. The dominance of the first principal component highlights how strongly correlated the flavor-related attributes are. Therefore, dimensionality reduction provides a meaningful and interpretable summary of the sensory structure.

MDS (Multidimensional Scaling)

Euclidean MDS

dist_matrix <- dist(sensory_scaled, method = "euclidean")
mds_model <- cmdscale(dist_matrix, k = 2, eig = TRUE)
mds_scores <- as.data.frame(mds_model$points)
colnames(mds_scores) <- c("Dim1", "Dim2")
mds_scores$Species <- species

ggplot(mds_scores, aes(Dim1, Dim2, color = Species, fill = Species)) +
  geom_point(alpha = 0.55, size = 2) +
  stat_ellipse(type = "norm", level = 0.95, geom = "polygon", alpha = 0.12, color = NA) +
  stat_ellipse(type = "norm", level = 0.95, linewidth = 0.7) +
  scale_color_manual(values = species_palette, na.value = "grey50") +
  scale_fill_manual(values = species_palette, na.value = "grey50") +
  coffee_theme +
  labs(
    title = "Euclidean MDS",
    x = "Dimension 1",
    y = "Dimension 2"
  )

Manhattan MDS

dist_matrix_manhattan <- dist(sensory_scaled, method = "manhattan")
mds_model_manhattan <- cmdscale(dist_matrix_manhattan, k = 2)

mds_scores_manhattan <- as.data.frame(mds_model_manhattan)
colnames(mds_scores_manhattan) <- c("Dim1", "Dim2")
mds_scores_manhattan$Species <- species

ggplot(mds_scores_manhattan, aes(Dim1, Dim2, color = Species, fill = Species)) +
  geom_point(alpha = 0.55, size = 2) +
  stat_ellipse(type = "norm", level = 0.95, geom = "polygon", alpha = 0.12, color = NA) +
  stat_ellipse(type = "norm", level = 0.95, linewidth = 0.7) +
  scale_color_manual(values = species_palette, na.value = "grey50") +
  scale_fill_manual(values = species_palette, na.value = "grey50") +
  coffee_theme +
  labs(
    title = "Manhattan MDS",
    x = "Dimension 1",
    y = "Dimension 2"
  )

From my perspective, both Euclidean and Manhattan MDS confirm that coffee sensory evaluations are structured along a continuous dimension rather than forming sharply separated groups. Even though Manhattan distance stretches the geometry and slightly emphasizes species differences, the fundamental pattern does not change. Therefore, I interpret the sensory space as being driven more by gradual variation in quality attributes than by strict categorical separation.

Comparative Discussion

Both variance-oriented PCA and distance-based MDS display comparable overall structures when the Euclidean metric is applied. However, the overall configuration is sensitive to the choice of distance metric, as different metrics alter the geometric relationships among observations. Although the Manhattan distance slightly improves the separation structure, the overlap between species remains considerable, indicating that the groups are not strongly distinct in the reduced dimensional space.

Conclusion

To sum up, this study applied Principal Component Analysis (PCA) to explore the structure of sensory evaluation variables. The results show that the first component accounts for 58.85% of the overall variance, indicating that a dominant latent dimension drives most of the variation in the dataset. The second and third components account for 12.60% and 7.65% of the variance, respectively. Together, the first three components explain nearly 79% of the total variability, meaning that most of the information in the dataset can be represented in a reduced dimensional space.The loadings show that several sensory attributes move in similar directions, indicating shared patterns across evaluation criteria. This suggests that the sensory scores are not independent from each other but are structured around broader latent characteristics.

Overall, PCA provides a clear and interpretable low-dimensional representation of the data, making it easier to understand the main variation patterns without losing substantial information.