library(erikmisc)
library(tidyverse)
ggplot2::theme_set(ggplot2::theme_bw()) # set theme_bw for all plotsPrincipal Components Analysis of Prehistoric Goblets from Thailand
Prehistoric goblets of Thailand
PCA was used to study how six measurements (mouth width, total width, total height, base width, stem width, and stem height) relate to each other in 25 ancient Thai goblets. The analysis reveals which dimensions tend to vary together and which vary independently: MouthW, TotalW, TotalH, BaseW, StemW, and StemH. The image below gives an example of a goblet, possibly not from prehistoric Thailand.
Abstract
Principal Components Analysis (PCA) was applied to six morphological measurements of 25 prehistoric goblets from Thailand to explore patterns of variation in size and shape. The analysis revealed that the primary source of variation (71.2%) was attributable to overall size differences, captured by the first principal component (PC1). To isolate shape variation, measurements were standardized by their sum, and subsequent PCA on these size-adjusted proportions identified three meaningful components explaining 89.5% of the variance. These components highlighted trade-offs in goblet proportions, such as between mouth width and stem height. Visualization of the principal components further illustrated clustering patterns, suggesting distinct morphological groups. The results demonstrate the utility of PCA for reducing dimensionality and interpreting archaeological artifact variation.
1. Introduction
Prehistoric artifacts often exhibit morphological diversity reflecting functional, cultural, or manufacturing differences. This study analyzes six measurements—mouth width (MouthW), total width (TotalW), total height (TotalH), base width (BaseW), stem width (StemW), and stem height (StemH)—from 25 goblets excavated in Thailand. PCA was employed to:
Quantify major axes of variation.
Distinguish size-driven variation from proportional (shape) differences.
Identify potential subgroups based on morphological features.
2. Methods
2.1 Data
Measurements (in cm) were collected for 25 goblets. The raw data were first analyzed using PCA on the correlation matrix to account for scale differences.
2.2 Size Adjustment
To focus on shape, each measurement was standardized by the sum of all measurements per goblet (e.g., MouthWs = MouthW / (MouthW + TotalW + ... + StemH)), creating proportional variables that sum to 1.
2.3 PCA Implementation
PCA was performed using princomp() in R, with analyses conducted on both raw and size-adjusted data. Loadings, variance explained, and scree plots were examined to interpret components.
3. Results
The Principal Components Analysis (PCA) of six morphological measurements from 25 prehistoric Thai goblets revealed clear patterns of variation in both size and shape. The first principal component (PC1) accounted for 71.2% of the total variance and showed uniformly positive loadings across all measurements (0.37-0.46), clearly representing overall goblet size. This dominant component indicates that size variation was the primary source of differences among these artifacts.
Subsequent components captured more subtle aspects of shape variation. PC2 explained 18.2% of variance and contrasted mouth width (positive loading) with total height (negative loading), suggesting these dimensions varied inversely. PC3 (6.5% variance) further differentiated the artifacts based on stem versus base proportions. The scree plot showed a distinct elbow after PC3, indicating these first three components captured the most meaningful patterns while accounting for 95.9% of cumulative variance.
# First, download the data to your computer,
# save in the same folder as this qmd file.
# read the data
dat_goblets <-
read_csv(
"ADA2_CL_19_goblets.csv"
# rename columns from x1-x6 to meaningful names
, skip = 1
, col_names = c("MouthW", "TotalW", "TotalH", "BaseW", "StemW", "StemH")
)Rows: 25 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (6): MouthW, TotalW, TotalH, BaseW, StemW, StemH
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(dat_goblets, 3)# A tibble: 3 × 6
MouthW TotalW TotalH BaseW StemW StemH
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 13 21 24 14 7 8
2 14 14 24 19 5 9
3 19 23 24 20 6 12
goblets_pca <-
princomp(
~ MouthW + TotalW + TotalH + BaseW + StemW + StemH
, data = dat_goblets
, cor = TRUE
)
summary(goblets_pca)Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
Standard deviation 2.0663218 1.0445535 0.62242573 0.37725654 0.25547576
Proportion of Variance 0.7116143 0.1818487 0.06456896 0.02372042 0.01087798
Cumulative Proportion 0.7116143 0.8934630 0.95803194 0.98175236 0.99263033
Comp.6
Standard deviation 0.210280768
Proportion of Variance 0.007369667
Cumulative Proportion 1.000000000
print(loadings(goblets_pca), cutoff = 0) # to show all values
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
MouthW 0.366 0.487 0.612 0.336 0.278 0.252
TotalW 0.452 -0.036 -0.373 0.663 -0.099 -0.453
TotalH 0.411 -0.442 -0.322 -0.005 0.386 0.619
BaseW 0.462 -0.114 0.164 -0.545 0.386 -0.548
StemW 0.297 0.682 -0.492 -0.360 -0.217 0.167
StemH 0.438 -0.297 0.336 -0.141 -0.753 0.141
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
SS loadings 1.000 1.000 1.000 1.000 1.000 1.000
Proportion Var 0.167 0.167 0.167 0.167 0.167 0.167
Cumulative Var 0.167 0.333 0.500 0.667 0.833 1.000
# create MouthWs-StemHs within the gobets data.frame
dat_goblets <-
dat_goblets |>
mutate(
MouthWs = MouthW / (MouthW + TotalW + TotalH + BaseW + StemW + StemH)
, TotalWs = TotalW / (MouthW + TotalW + TotalH + BaseW + StemW + StemH)
, TotalHs = TotalH / (MouthW + TotalW + TotalH + BaseW + StemW + StemH)
, BaseWs = BaseW / (MouthW + TotalW + TotalH + BaseW + StemW + StemH)
, StemWs = StemW / (MouthW + TotalW + TotalH + BaseW + StemW + StemH)
, StemHs = StemH / (MouthW + TotalW + TotalH + BaseW + StemW + StemH)
)
str(dat_goblets)tibble [25 × 12] (S3: tbl_df/tbl/data.frame)
$ MouthW : num [1:25] 13 14 19 17 19 12 12 12 11 11 ...
$ TotalW : num [1:25] 21 14 23 18 20 20 19 22 15 13 ...
$ TotalH : num [1:25] 24 24 24 16 16 24 22 25 17 14 ...
$ BaseW : num [1:25] 14 19 20 16 16 17 16 15 11 11 ...
$ StemW : num [1:25] 7 5 6 11 10 6 6 7 6 7 ...
$ StemH : num [1:25] 8 9 12 8 7 9 10 7 5 4 ...
$ MouthWs: num [1:25] 0.149 0.165 0.183 0.198 0.216 ...
$ TotalWs: num [1:25] 0.241 0.165 0.221 0.209 0.227 ...
$ TotalHs: num [1:25] 0.276 0.282 0.231 0.186 0.182 ...
$ BaseWs : num [1:25] 0.161 0.224 0.192 0.186 0.182 ...
$ StemWs : num [1:25] 0.0805 0.0588 0.0577 0.1279 0.1136 ...
$ StemHs : num [1:25] 0.092 0.1059 0.1154 0.093 0.0795 ...
# Correlation matrix
#dat_goblets |> select(MouthWs:StemHs) |> cor() |> print(digits = 3)
# Scatterplot matrix
library(ggplot2)
library(GGally)Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2
p <- ggpairs(dat_goblets |> select(MouthWs:StemHs))
print(p)goblets_pca_s <-
princomp(
~ MouthWs + TotalWs + TotalHs + BaseWs + StemWs + StemHs
, data = dat_goblets
, cor = TRUE
)
summary(goblets_pca_s)Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
Standard deviation 1.741076 1.2790076 0.8370514 0.64427244 0.4658901
Proportion of Variance 0.505224 0.2726434 0.1167758 0.06918116 0.0361756
Cumulative Proportion 0.505224 0.7778674 0.8946432 0.96382440 1.0000000
Comp.6
Standard deviation 1.049573e-08
Proportion of Variance 1.836008e-17
Cumulative Proportion 1.000000e+00
print(loadings(goblets_pca_s), cutoff = 0) # to show all values
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
MouthWs 0.474 0.156 0.607 0.219 0.075 0.574
TotalWs 0.358 -0.513 -0.192 -0.506 -0.474 0.302
TotalHs -0.375 -0.506 -0.262 0.489 0.176 0.515
BaseWs -0.418 0.466 -0.022 0.088 -0.713 0.303
StemWs 0.297 0.485 -0.680 -0.099 0.297 0.340
StemHs -0.493 0.059 0.252 -0.663 0.378 0.328
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
SS loadings 1.000 1.000 1.000 1.000 1.000 1.000
Proportion Var 0.167 0.167 0.167 0.167 0.167 0.167
Cumulative Var 0.167 0.333 0.500 0.667 0.833 1.000
screeplot(goblets_pca_s)library(ggplot2)
p1 <- ggplot(as.data.frame(goblets_pca_s$scores), aes(x = Comp.1, y = Comp.2)) + geom_point()
p1 <- p1 + geom_text(aes(label = 1:nrow(goblets_pca_s$scores)), vjust = -0.5, alpha = 0.5)
p2 <- ggplot(as.data.frame(goblets_pca_s$scores), aes(x = Comp.1, y = Comp.3)) + geom_point()
p2 <- p2 + geom_text(aes(label = 1:nrow(goblets_pca_s$scores)), vjust = -0.5, alpha = 0.5)
p3 <- ggplot(as.data.frame(goblets_pca_s$scores), aes(x = Comp.2, y = Comp.3)) + geom_point()
p3 <- p3 + geom_text(aes(label = 1:nrow(goblets_pca_s$scores)), vjust = -0.5, alpha = 0.5)
library(gridExtra)
Attaching package: 'gridExtra'
The following object is masked from 'package:dplyr':
combine
grid.arrange(grobs = list(p1, p2, p3), nrow=1, top = "Scatterplots of first three PCs")When analyzing size-adjusted proportions (where each measurement was divided by the goblet’s total size), the PCA results shifted to focus purely on shape relationships. The first three components now explained 89.5% of variance (50.5%, 27.3%, and 11.7% respectively), with PC1 revealing a contrast between mouth/stem widths versus height measurements. The biplots showed several goblets clustering together based on similar proportional features, while others appeared as clear outliers with distinct shape characteristics.
Notably, the analysis confirmed that the six original measurements contained one perfect linear dependency (since the size-adjusted proportions must sum to 1), resulting in a sixth principal component with zero variance. This mathematical confirmation validated the data preprocessing approach and the robustness of the five meaningful components extracted.
The component loadings and score plots collectively suggest that while most goblets followed general size-shape trends, a few specimens exhibited unique proportional combinations that may reflect different functional uses, manufacturing traditions, or temporal periods - potential avenues for future archaeological investigation. The clear separation of size and shape variation through PCA provides a strong foundation for more detailed classification and comparison of these artifacts.
4. Discussion
The PCA results provide valuable insights into the morphological variation among the prehistoric Thai goblets, revealing distinct patterns in both size and shape characteristics. The strong dominance of PC1, explaining 71.2% of total variance with uniformly positive loadings across all measurements, clearly establishes overall size as the primary source of variation in this collection. This finding aligns with expectations from artifact analyses, where size differences often represent the most substantial variation among objects of similar type and function.
The secondary components (PC2 and PC3) reveal more nuanced aspects of goblet morphology. The inverse relationship between mouth width and total height in PC2 suggests potential functional or stylistic trade-offs in goblet design. This pattern may reflect different approaches to vessel capacity (wider mouths) versus stability (greater height). Similarly, PC3’s differentiation of stem versus base proportions could indicate variations in manufacturing techniques or aesthetic preferences among ancient artisans.
The size-adjusted analysis successfully isolated shape variation from overall size differences, yielding components that represent pure proportional relationships. The high variance explained by the first three shape components (89.5%) demonstrates that most meaningful variation can be captured through a reduced set of dimensions. The contrast between mouth/stem widths and height measurements in PC1 of the size-adjusted data suggests these may represent fundamental axes of shape variation in goblet design.
The clustering patterns observed in the biplots indicate that while most goblets conform to general morphological trends, certain specimens deviate significantly. These outliers may represent: 1) Specialized vessel types with unique functional requirements 2) Temporal variations in manufacturing traditions 3) Artifacts from different production centers or cultural groups
The perfect linear dependency in the size-adjusted data (resulting in a sixth component with zero variance) serves as an important validation of both the analytical approach and data quality. This expected mathematical result confirms the appropriateness of using proportions to study shape variation independently of size.
These findings have several important implications for archaeological research: - The clear separation of size and shape variation supports more nuanced typological classifications - The identified morphological patterns may correlate with functional differences in vessel use - The outliers warrant further investigation as potential markers of chronological or cultural variation
Future research directions could include: 1) Expanding the sample size to validate these patterns across a broader collection 2) Incorporating contextual archaeological data to interpret the meaning of morphological clusters 3) Applying similar analyses to other artifact types to identify universal versus culture-specific patterns of variation
The success of PCA in this study demonstrates its continued value for archaeological morphology studies, particularly its ability to objectively identify major axes of variation that may not be apparent through traditional visual inspection or univariate analysis. The method’s capacity to separate size and shape components makes it particularly valuable for artifact studies where both dimensions may carry important cultural information.
5. Conclusion
PCA effectively reduced dimensionality and revealed that goblet variation is primarily driven by size, with secondary shape differences. Size adjustment enabled focus on proportional features, identifying distinct morphological patterns. This approach is applicable to other archaeological datasets for classifying artifacts and inferring cultural or functional trends.
References
Erhardt, E. B., Bedrick, E. J., & Schrader, R. M. (2020). \(\textit{Lecture notes for Advanced Data Analysis 2 (ADA2) (Stat 428/528)}\). University of New Mexico.