Difficulties tracing and interpreting patterns in compositional data of metal artefacts

Why are the more complex methods not always useful?

Petr Pajdla1
Alžběta Danielisová2
Daniel Bursák2
Ladislav Strnad3
Jakub Trubač3

EAA 2020 Session 464: TRAUM – Tracing Reality in Archaeology Using Machine Learning
26th August 2020

1: Department of Archaeology and Museology, Masaryk University, Brno
2: Institute of Archaeology CAS, Prague, v.v.i.
3: Charles University, Prague

The case of a Duchcov hoard

  • originally (probably) more than 1000 metal objects
  • 4th century BC (early La Tène period)
  • bronze jewellery (brooches and bracelets) in a bronze cauldron
  • deposited in a natural spring

plot of chunk cauldron

Tracing reality in metal compositions

What we have?

What reality are we tracing?

  • Origin of the metal artefacts.
  • What were the reasons for the deposition?
  • Protecting the goods?
  • 'Claiming' the land?

Background:
Discussion of Celtic migrations in the 4th century BC…

Metal provenancing

Compositional data

  • Detecting various elements present in the metal objects
  • Major elements:
    Cu, Sn
  • Trace elements:
    Li, Mg, AL, Mn, Fe, Pb, Sb, Ag, As, Zn, Ni, Co, Fe, Bi etc.

→ Multidimensional data

Problems:

  • Choosing the right (trace) elements for the analysis.
  • Contaminations from the soil.
  • Probable recurrent melting and casting of new artefacts.

Isotopic data

  • Detecting Lead (Pb) isotopes
  • Various ore sources have various signals (age of the deposit etc.)
  • Most common Lead isotopes:
    \( ^{206}Pb/^{204}Pb \), \( ^{207}Pb/^{204}Pb \), \( ^{208}Pb/^{204}Pb \), \( ^{207}Pb/^{206}Pb \) and \( ^{208}Pb/^{206}Pb \)

Problems:

  • Difficulties separating various sources with similar signals.
  • Mixing of ores with different origins.

Data

  • 49 samples from the Duchcov hoard:
  • 36 samples from brooches
  • 6 samples from cauldron and rivets and 7 bracelets

  • 1510 lead isotope data points (without outliers) for regions:

plot of chunk sources-plot

Tasks

  • Find consistent groups in the element compositions of individual items in the hoard
  • Find the most probable source(s) of the lead isotope signal, i.e. the origin of the ore

plot of chunk map

Approach

Clustering problem

  • Dimensionality reduction and clustering on the element compositions
  • Groups of items with similar element composition were probably made using similar combination of ores using similar methods, probably by the same artisan or workshop…

Methods

  • k-means clustering
  • model-based clustering

Classification problem

  • Finding the origin of the lead in the bronze alloys
  • The source of the lead ore is also most probable region where the artefacts were created.
  • Local or non-local production?

Methods

  • distance based methods
    • Euclidean distance
    • Mahalanobis distance
  • LDA
  • tree-based methods?

Clustering

Kmeans

  • Dimensionality reduction using PCA.
  • 4 clusters (silhouettes, elbow).

plot of chunk km

Model-based

  • R (2020) package mclust (Scrucca et al. 2016)
  • Gaussian mixture model: EEV, 4 clusters, see here.

plot of chunk mclust-fig

Classification

  • Find the most probable source of lead isotope signal in the ore
  • Various methods used…
    • Often only Eucidean distance to the nearest cloud of points (e.g. Ling et al. 2014)
  • Difficult (almost impossible) due to a lot of overlays in the clouds of points

plot of chunk sources-only

Conclusions

Task:
Finding meaningful groups in the emlement compositions of bronze artefacts from an early La Téne period hoard and identify possible origin of the artefacts based on lead isotopes.

Approach:

  1. Clustering using kmeans and gaussian mixture model. Results of model based clustering are difficult to interpret archaeologically…
  2. Attempt to identify possible sources of the lead in the ore based on the lead isotope signals.
    Difficult due to possible mixing and overlaps in the lead isotope signals of different sources.
  • Method commonly used – Euclidean distance – is not suitable for this task, because it does not take into account the shape of the distribution.
  • We suggest Mahalanobis distance as more proper metric.
  • Discriminant analysis (LDA) not wery successful at assigning correct group (0.51 correct predictions).
  • What we plan next: tree-based methods

Sources

Ling, J., Stos-Gale, Z., Grandin, L., Billström, K., Hjärthner-Holdar, E., Persson, P.-O., 2014. Moving metals II: Provenancing scandinavian bronze age artefacts by lead isotope and elemental analyses. Journal of Archaeological Science 41, 106–132. https://doi.org/https://doi.org/10.1016/j.jas.2013.07.018

R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

Scrucca L., Fop M., Murphy T. B. and Raftery A. E. (2016) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models The R Journal 8/1, pp. 289-317

Slides at https://rpubs.com/knytt/eaa2020_traum