stat4arch

Petr Pajdla & Peter Tkáč

AES_707: Statistics seminar for archaeologists

Seminar 5

14. 4. 2022

Today:

  • Reproducibility
  • Rmarkdown
  • Your datasets

Exercise

Binford's dataset

# install.packages("binford")
library(binford)
data(LRB)
data(LRBkey)

Explore the data

  1. How many variables are in dataset LRB?
  2. How many observations?
  3. What is Binford's data set LRB describing? In other words, what are the observations and variables?
  4. What is the purpose of LRBkey data set?
  5. Is the any correlation between mean size of a family (famsz) and a size of a single family dwelling (sz1fam)?
    Hint: in the cor() function, use only complete observations (argument use = "complete.obs")
  6. How does the size of dwellings (sz1fam) vary in different continents (wldsec)?
  7. Is there any difference in the density of population (density) and the primary source of food (subsp.1)?

Consider which type of plot (histogram, density, boxplot, scatterplot) will best help you answer the questions. For some questions, there are more possible options.

Reproducible research

Reproducibility

  • Allow other people to build up on your work…

For the findings of a study to be reproducible means that results obtained (…) in a statistical analysis of a data set should be achieved again with a high degree of reliability when the study is replicated.
https://en.wikipedia.org/wiki/Reproducibility

(Marwick et al. 2017; Marwick et al. 2018; )

In your article/thesis/project do:

Guides

Kieran Healy

The Plain Person’s Guide to Plain Text Social Science
http://plain-text.co/

Ben Marwick

  • Marwick, B. 2017: Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation. Journal of Archaeological Method and Theory 24(2): 424–450. DOI: 10.1007/s10816-015-9272-9.
  • Marwick, B., Boettiger, C. and Mullen, L. 2018: Packaging Data Analytical Work Reproducibly Using R (and Friends). The American Statistician 72(1): 80–88. DOI: 10.1080/00031305.2017.1375986.

British Ecological Society

Guides to better science: guide on reproducible code and data management.

RMarkdown

Packages

# install.packages("rmarkdown")
# install.packages("knitr")

library(rmarkdown)
library(knitr)

Rmarkdown - what is it?

Rmarkdown - what is it?

Types of content

  • Header
  • R code
  • Plain text…

YAML header

YAML header separated by --- and bearing info about author, date, type of output…

Code chunks

Code chunks separated by:

```{r chunk-name}
code here…
```

Text

  • plain text

Code chunks - place for your code

key shortcut: ctrl + alt + i

input:

output:

a <- 10+5
a
[1] 15

Code chunks - options I.

  • by default, both code and result are shown in the result file. You can change this by adding options into the chunk - {r, <option>}

Options (examples):

  • echo = FALSE

    • result is shown but the code is not shown in the result file
    • suitable for e.g. codes of graphs, when you just want to show the graph,…
  • include = FALSE

    • neither result nor code are shown in result file
    • suitable for e.g. loading packages, basic manipulation of the data,…

Code chunks - options II.

Inline code

input:

output:

Tables using `knitr::kable`

kable(sipky[1:5,1:10], caption = "DartPoints dataset")

Table: DartPoints dataset

Name Catalog TARL Quad Length Width Thickness B.Width J.Width H.Length
Darl 41-0322 41CV0536 26/59 42.8 15.8 5.8 11.3 10.6 11.6
Darl 35-2946 41CV0235 21/63 40.5 17.4 5.8 NA 13.7 12.9
Darl 35-2921 41CV0132 20/63 37.5 16.3 6.1 12.1 11.3 8.2
Darl 36-3487 41CV0594 10/54 40.3 16.1 6.3 13.5 11.7 8.3
Darl 36-3321 41CV1023 12/58 30.6 17.1 4.0 12.6 11.2 8.9

Markdown syntax