Petr Pajdla & Peter Tkáč
AES_707: Statistics seminar for archaeologists
17. 3. 2022
mean(x)
\( \begin{aligned}
\overline{x} = \frac{x_1 + x_2 + \cdots + x_n}{n} = \frac{1}{n} (\sum^n_{i=1}x_i)
\end{aligned} \)
median(x)
Robust, minimizes influence of outliers.
(rozpětí)
max(x) - min(x) or range(x)
(rozptyl a směrodatná odchylka)
sd(x)
\( \begin{aligned}
\sigma = \sqrt{s^2} = \sqrt{\frac{\sum(x_i-\overline{x})^2}{n-1}}
\end{aligned} \)
(midspread, IQR, kvantil, mezikvartilové rozpětí)
IQR(x)
Robust, minimizes influence of outliers.
Anscombe's quartet
# A tibble: 4 × 6
set `mean(x)` `sd(x)` `mean(y)` `sd(y)` `cor(x, y)`
<int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 9 3.32 7.50 2.03 0.816
2 2 9 3.32 7.50 2.03 0.816
3 3 9 3.32 7.5 2.03 0.816
4 4 9 3.32 7.50 2.03 0.817
Four sets of numerical data, all have almost identical values of descriptive statistics…
(Minard 1869)
(Snow 1854)
(Sloupcový graf)
Distribution of values of a qualitative variable.
Distribution of values of a quantitative variable.
Similar to histogram, great for comparison.
(Krabicový graf)
Comparison of two and more quantitative variables.
Cvičenie s tabulkou lokality.
lokalita objekty_ks
1 Vedrovice 27
2 Kyjovice 13
3 Pohansko 55
4 Mikulčice 29
5 Znojmo 20
barplot(lokality$objekty_ks,
names.arg = lokality$lokalita,
col = "lightblue")
install.packages("ggplot2")
library(ggplot2)
ggplot(data = lokality, aes(x = lokalita, y = objekty_ks)) +
geom_bar(stat = "identity")
Dlhý zápis:
ggplot(data = lokality, mapping = aes(x = lokalita, y = objekty_ks)) +
geom_bar(stat = "identity")
Krátky zápis:
ggplot(lokality, aes(lokalita, objekty_ks)) +
geom_bar(stat = "identity")
Cvičenie s databázou EWBurials.
library(archdata) # ?EWBurials
data(EWBurials)
hroby <- data.frame(EWBurials)
head(hroby, 4)
Group North West Age Sex Direction Looking Goods
011 2 96.96 90.32 Young Adult Male 42 283 Present
014 2 100.20 90.61 Young Adult Male 28 272 Present
015 2 101.74 91.62 Old Adult Male 350 219 Present
016a 2 101.00 90.47 Young Adult Male 335 60 Absent
p <- ggplot(hroby, aes(x = Sex))
p + geom_bar()
Ako by ste spravili podobný graf, zobrazujúci rozdelenie hrobov podľa veku?
p <- ggplot(hroby, aes(x = Age))
p + geom_bar()
p <- ggplot(hroby, aes(x = Sex)) +
geom_bar()
p + labs(x = "pohlaví",
y = "počet",
title = "Počet hrobů podle pohlaví",
caption = "Archdata::EWBurials")
Stacked bar chart: geom_bar().
p <- ggplot(hroby, aes(x = Sex, fill = Age))
p + geom_bar()
Dodged bar chart: geom_bar(position = "dodge").
p <- ggplot(hroby, aes(x = Sex, fill = Age))
p + geom_bar(position = "dodge")
p <- ggplot(hroby, aes(x = Sex, fill = Goods))
p + geom_bar()
p <- ggplot(hroby, aes(x = Age, fill = Goods))
p + geom_bar()
geom_bar(position = "fill")
p <- ggplot(hroby, aes(x = Age, fill = Goods))
p + geom_bar(position = "fill")
p <- ggplot(hroby, aes(x = Age))
p + geom_bar() +
facet_grid(Sex ~ Group)
Aký je rozdiel medzi týmito datasetmi?
hroby
Group North West Age Sex Direction Looking Goods
011 2 96.96 90.32 Young Adult Male 42 283 Present
014 2 100.20 90.61 Young Adult Male 28 272 Present
015 2 101.74 91.62 Old Adult Male 350 219 Present
016a 2 101.00 90.47 Young Adult Male 335 60 Absent
018 2 101.65 90.46 Old Adult Male 3 86 Present
020 1 95.17 90.53 Young Adult Male 142 21 Absent
lokality
lokalita objekty_ks
1 Vedrovice 27
2 Kyjovice 13
3 Pohansko 55
4 Mikulčice 29
5 Znojmo 20
p <- ggplot(hroby, aes(x = Age))
p + geom_bar()
p <- ggplot(lokality,
aes(x = lokalita,
y = objekty_ks))
p + geom_bar(stat="identity")
Journal of Open Archaeology Data
https://openarchaeologydata.metajnl.com/
Book Qunatitative Methods in Archaeology Using R by D. L. Carlson and associated R package archdata
library(archdata)
?archdata # list of data sets in the package