Archaeology Data Infrastructures


Data reuse potentials and limitations to understanding
settlement patterns in the prehistory


Petr Pajdla
Petr Květina (supervisor)

17th December 2021, PhD seminar

Understanding Neolithic Society through the study of Axes and Adzes

Former topic of my PhD…

Idea:

  • collect images of PST;
  • do the magic (morphometrics);
  • observe the variation in shape;
  • explain it using variables like raw material, region, period etc.

Problems?

Not enough data was gathered to make up for the low quality?

Or the inaccuracies in the original data gets amplified with more data
(not an intuitive thing, cf. Huggett 2020, S13)

A belief that if we gather a critical amount of data, something will come out?

Huggett, J., 2020. Is Big Digital Data Different? Towards a New Archaeological Paradigm. Journal of Field Archaeology 45, S8–S17. https://doi.org/10.1080/00934690.2020.1713281

Map of stone tools distribution. PCA of morphospace with stone adzes.

My work so far...

  • Morphometrics and stone tools
  • Compositional data and isotopes
  • Digital data and reproducibility in archaeology
  • Network approaches
  • Spatial analysis and settlement patterns

PhD Comics.

The road so far...

(1) Morphometrics & polished stone tools

  • How to do the analysis? Why is it useful?
  • Why is there variation in shape?

Model for pottery change in the LBK: newcomer ♀ (exogamy) → learns local tradition → adapts it by adding motifs brought from her original community → change of style → change possible only at certain moment of time → no space for individual inovation and agency?

Adapting the model for PST: made by ♂ → ♂ are mostly local → no change…
shared idea of ideal shape & manufacturing practice → learning process → small decisions taken during manuf. process → signif. outcome (tyranny of small decisions)

Pajdla, P. 2017. Morfometrie broušených kamenných nástrojů: Jak a proč? Otázky neolitu a eneolitu našich zemí 2017, 25. – 27. 9. 2017, Bělecký mlýn u Prostějova.

Pajdla, P. 2018. Morphometric shape analysis in R. Theoretical background, methods and application on LBK shoe-last adzes. Počítačová podpora v archeologii 17, 28. – 30. 5. 2018, Kouty.

Pajdla, P. 2020. Variation in the shape of polished and beveled stone tools as a result of small decisions within borders of shared manufacturing practice. 26th EAA Virtual Annual Meeting, 24. – 30. 8. 2020. poster: https://eaa2020stonevariation.onrender.com/, presentation: https://rpubs.com/knytt/eaa2020stonevariation1

Poster on stone tools variation.

The road so far...

(2) Isotopes and compositional data

  • Long-lasting attempts to find relationships between source ores and signal from the artefacts.
  • Challenging – many mathematical and theoretical problems involved.

Pajdla, P., Danielisová, A., Bursák, D., Strnad, L., Trubač, J. 2020. Difficulties tracing and interpreting patterns in compositional data of metal artefacts. 26th EAA Virtual Annual Meeting, 24. – 30. 8. 2020. https://rpubs.com/knytt/eaa2020_traum

Danielisová, A., Pajdla, P., Bursák, D., Strnad, L., Trubač, J., Kmošek, J., 2021. Claiming the land or protecting the goods? The Duchcov hoard in Bohemia as a proxy for ‘Celtic migrations’ in Europe in the 4th century BCE. Journal of Archaeological Science 127, 105314. https://doi.org/10.1016/j.jas.2020.105314

Bursák, D., Danielisová, A., Magna, T., Pajdla, P., Míková, J., Rodovská, Z., Strnad, L., Trubač, J., 2021. Archaeometric perspective on the emergence of brass north of the Alps around the turn of the Era. [accepted in Scientific Reports, preprint: https://doi.org/10.21203/rs.3.rs-715158/v1]

Duchcov article front matter.

The road so far...

(3) Digital data & reproducible analyses

  • Connected to my work at AIS CR as data manager.
  • How to take care of the digital data?
  • How to make our analyses reproducible?

Kubelková, H., Pajdla, P. 2021. Data management in archaeology project.
[workshop 19. 11. 2021, article in prep. for Advances in Archaeological Practice, 2022?]

Schmidt, S., Pajdla, P., Schmid, C. 2021. Developing R packages (workshop). Digital Crossroads. Computer Applications in Archaeology Conference. 14. – 18. 6. 2021, Limassol (online). https://github.com/sslarch/caa2021_Rpackage_workshop

Pajdla, P., Kubelková, H., Květina, P. 2021. Data driven Archaeology. Are we there yet? Theoretical Approaches to Computational Archaeology (CE TAG 7th Annual Meeting). 19. – 20. 10. 2021, Brno.

Road to reproducibility.

This image was created by Scriberia for The Turing Way community and is used under a CC-BY licence. DOI 10.5281/zenodo.3332807

CAA SIG SSLA logo.
AIS CR logo.

The road so far...

(4) Networks

  • Networks as great ways of thinking about complex systems.
  • Focus on relationships (edges) between individual elements of the system (vertices/nodes with given attributes).

Pajdla, P. 2018. Early Neolithic Materialized Identity Networks: A Case Study of Vedrovice Cemetery. Historical Network Research Conference, 11. – 13. 9. 2018, Brno.

Pajdla, P. 2019. Exploring Early Neolithic Materialized Identity Networks. Check Object Integrity. 47th Computer Applications and Quantitative Methods in Archaeology. 23. – 27. 4. 2019, Kraków.

Pajdla, P. 2022? Spatial Patterns and Grave Goods Differences at the Cemetery of Vedrovice (Czech Republic): A Resampling Approach to Identity Markers in the Early Neolithic. Journal of Computer Applications in Archaeology. [under review]

Network of burials at Vedrovice cemetery.
Network of burials at Vedrovice cemetery.

Artefact associations at the Vedrovice cemetery.

The road so far...

(5) Settlement patterns

  • Focus on:

    • first order properties – relationship between sites and their surroundings, i.e. environment etc.,
    • second order properties – relationship between sites and other sites.

Pajdla, P., Trampota, F. 2019. Transformations in settlement structures and distribution systems in the Neolithic Moravia Socio-Environmental Dynamics over the Last 15,000 Years: The Creation of Landscapes VI, 11. – 16. 3. 2019, Kiel.

Pajdla, P., Trampota, F. 2021? Neolithic Settlements in Central Europe: Data from the Project 'Lifestyle as an Unintentional Identity in the Neolithic'. Journal of Open Archaeology Data. [accepted, data deposited at https://doi.org/10.5281/zenodo.5653180]

Trampota, F., Pajdla, P. 2022? Spatial analysis of Neolithic settlement patterns in central Europe: Case study of East-Bohemia and the Morava river catchment [in prep.]

New topic proposal


Archaeology Data Infrastructures:
Data reuse potentials and limitations to understanding settlement patterns in the prehistory

Problem definition & questions

  • Data reuse
  • Representativeness of the data & biases
  • Sites as point patterns

Question I

Data (re)use

Data munging and wrangling (extract, transform, load paradigm) can take up 50% to 70% of time dedicated to preparing a PhD (Mons 2018, 10–14).

  • How difficult is it to reuse data already existing in archaeology data infrastructures and legacy databases?
  • Can we reuse data collected with different questions in mind to solve by original data collectors unforeseen questions?

→ A review of current state and practice…

“[…] at least in certain parts of the world, we cannot in all good conscience claim ‘we don’t yet have enough evidence’ or that we should ‘wait till the evidence is in’.” (Bevan 2015, 1477)

Mons, B. 2018. Data Stewardship For Open Science: Implementing FAIR principles. Boca Raton: CRC Press, Taylor & Francis Group.

Bevan, A. 2015 The data deluge. Antiquity 89(348): 1473–1484. DOI: https://doi.org/10.15184/aqy.2015.102.

Map of archaeological sites in the OSM data.

Question II

The biases in the data

The representativeness of the available data is not constant across the Czech Republic and different biases are present…

  • Do (significantly) underrepresented regions exist in the data?
  • What is the cause? Can we identify the present biases?

Causes?

  • History of research.
    • Less focus in some parts - too far from the centres?
    • Different approaches of different institutions?
  • History of land use.
    • Effects of long term forests vs agricultural activities.
    • Effects of urban areas?

and more…

Map of positive and negative finds.

Question III

Sites as point patterns

Second order effects (properties) are explored, i.e. relationships between sites and other site locations (Nakoinz, Knitter 2016, 135-144).

(Usually, only the first order effects, i.e. relationships with the surroundings of the point locations are explored…)

  • Are the locations of points dependent on other point locations?
  • Are there any clustering or repulsive processes taking place in the point patterns?
  • Is there a relationship with site locations in previous periods?

→ What is the explanation in archaeology?
→ Can we simulate existing settlement patterns?

Nakoinz, O and Knitter, D. 2016. Modelling Human Behaviour in Landscapes. Quantitative Archaeology and Archaeological Modelling. Cham: Springer. DOI: https://doi.org/10.1007/978-3-319-29538-1.

image: Baddeley, A. 2010 Analysing spatial point patterns in R, v. 4.1, 8.

Point pattern models for different intensities and interactions.

Methods

Transparent and open approaches

Large data collections require approaches different from methods usually used to analyse archaeological data and more than ever, we see the need for transparent and reproducible methods (Huggett 2020, S14–S15).

Reproducibility (Marwick et al., 2018; Marwick, 2017):

  • instead of drag-and-drop and button clicking approaches, document the analysis using code;
  • publish both the data and code used in the analysis producing the results.

Huggett, J., 2020. Is Big Digital Data Different? Towards a New Archaeological Paradigm. Journal of Field Archaeology 45, S8–S17. https://doi.org/10.1080/00934690.2020.1713281

Marwick, B. 2017 Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation. Journal of Archaeological Method and Theory 24(2): 424–450. DOI: https://doi.org/10.1007/s10816-015-9272-9.

Marwick, B, Boettiger, C and Mullen, L. 2018 Packaging Data Analytical Work Reproducibly Using R (and Friends). The American Statistician 72(1): 80–88. DOI: https://doi.org/10.1080/00031305.2017.1375986.

Reproducible, replicable, robust and generalisable analysis.

This image was created by Scriberia for The Turing Way community and is used under a CC-BY licence. DOI 10.5281/zenodo.3332807.

R logo.
GitHub logo.
Zenodo logo.
CAA SIG SSLA logo.

Summary

Archaeology Data Infrastructures:
Data reuse potentials and limitations to
understanding settlement patterns in the prehistory

  • Data reuse.
  • Awareness of quality and biases present in the data.
  • Verification of the results by controlling their robustness.
  • Open science principles, reproducibility, coded analysis.
  • Insight into settlement patterns through the analysis of second order effects.

Thank you for the attention!

Data by period in the AMCR.