We will introduce you to the Neotoma
Paleoecology Database environment, including explorer, Tilia and the
neotoma2
R package. The database uses an application
programming interface (API), which serves the explorer as well as Tilia
but also lets you query the database in R. We will show you which
information you can most easily extract in either of these choices and
provide a hands-on R exercise on working with charcoal and pollen
data.
The
neotoma2
package is available from GitHub and can be
installed in R using the devtools
package. We also load
other packages.
devtools::install_github('NeotomaDB/neotoma2')
library(neotoma2)
library(riojaPlot)
library(rioja)
library(vegan)
Sites in neotoma2
are primarily spatial objects. They
have names, locations, and are found within the context of geopolitical
units, but within the API and the package, the site itself does not have
associated information about taxa, dataset types or ages. It is simply
the container into which we add that information.
We can use several parameters to search sites, one of them is
taxa
. Here we search all sites containing “Charcoal” taxon
and its variations.
char_sites <- get_sites(taxa = 'Charcoal%', all_data = TRUE)
neotoma2::summary(head(char_sites, 10))
## siteid sitename collectionunit chronolgies datasets types
## 1 8 Abalone Rocks Marsh ABALONE 0 1 charcoal
## 2 196 Hells Kitchen Lake HELLSK05 0 1 pollen
## 3 214 Arts Lough ARTSLOUG 0 1 pollen
## 4 273 Blacktail Pond BTP06 0 1 pollen
## 5 459 Wentworth Lake WENT 0 1 macrocharcoal
## 6 519 Cub Creek Pond CUBCRK17 0 1 pollen
## 7 525 Cupola Pond CUPOLA4 0 1 macrocharcoal
## 8 2303 Silver Lake SILVER07 0 1 charcoal
## 9 2520 Stotzel-Leis Site STOT14 0 1 charcoal
## 10 2570 Lake Tulane TULANEG 0 1 charcoal
Now we get datasets to the sites, and filter charcoal dataset types and plot on the map.
char_ds <- char_sites %>% get_datasets(all_data = TRUE)
table(as.data.frame(datasets(char_ds))$datasettype)
##
## charcoal charcoal surface sample dinoflagellates
## 228 94 1
## macrocharcoal microcharcoal plant macrofossil
## 28 26 30
## pollen
## 414
char_mc <- char_ds %>% neotoma2::filter(datasettype %in% c("macrocharcoal", "charcoal", "microcharcoal"))
neotoma2::plotLeaflet(char_mc)
As an example, we will use the site ‘Prášilské jezero’, a mountain lake located in Czechia (Šumava Mts.). This site has been analyzed for several proxies, including pollen and macrocharcoal.
First, we obtain the siteid
and other metadata including
its datasets for the chosen site and plot its position:
PRA_ds <- get_datasets(get_sites(sitename = "Prášilské jezero"), all_data = TRUE)
plotLeaflet(PRA_ds)
Filter out the datasets ‘pollen’ and ‘charcoal’
PRA_pol_ch <- PRA_ds %>% neotoma2::filter(datasettype %in% c("pollen", "charcoal"))
And pull in and extract the sample data
PRA_rec <- PRA_pol_ch %>% get_downloads()
PRA_samp <- PRA_rec %>% samples()
Now we will filter
only charcoal
samples
with values as area
(function filter
of
dplyr
package). We also select only relevant columns, and
merge variablename
and elementtype
.
PRA_charcoal <- samples(PRA_rec) %>% dplyr::filter(ecologicalgroup == "CHAR" & elementtype == "area") %>%
dplyr::select(depth, age, variablename, elementtype, value) %>%
dplyr::mutate(variable = paste(variablename, elementtype, sep = " "), .keep = "unused") %>%
dplyr::group_by(depth, age) %>%
dplyr::arrange(depth)
In the next step, we pull in pollen samples that are measured using
NISP
(number of identified specimen), only pollen element
(to avoid e.g., stomata). Finally, we filter
taxa to
calculate pollen percentages for ecological groups of trees and shrubs
(TRSH
) and upland herbs (UPHE
). We also
produce a taxa table with ecological groups for plotting.
PRA_samp_short <- samples(PRA_rec) %>%
dplyr::filter(ecologicalgroup %in% c("TRSH","UPHE")) %>%
dplyr::filter(elementtype == "pollen") %>%
dplyr::filter(units == "NISP")
PRA_eco_g <- PRA_samp_short %>% dplyr::select(variablename, ecologicalgroup) %>% dplyr::distinct()
Finally, we need to transform the values to proportions using all TRSH and HERB taxa.
PRA_pollen_perc <- PRA_samp_short %>%
dplyr::group_by(depth, age) %>%
dplyr::mutate(pollencount = sum(value, na.rm = TRUE)) %>%
dplyr::group_by(variablename) %>%
dplyr::mutate(prop = value / pollencount) %>%
dplyr::arrange(desc(age))
For further use in other packages, it is useful to pivot the data to
a “wide” table, with variablenames
as column headings. We
do this for both pollen and charcoal datasets using
tidyr
.
PRA_pollen_perc <- PRA_pollen_perc %>%
dplyr::select(depth, age, variablename, prop) %>%
dplyr::mutate(prop = as.numeric(prop))
PRA_pollen_wide <- tidyr::pivot_wider(PRA_pollen_perc,
id_cols = c(depth, age),
names_from = variablename,
values_from = prop,
values_fill = 0)
PRA_charcoal_wide <- tidyr::pivot_wider(PRA_charcoal,
id_cols = c(depth, age),
names_from = variable,
values_from = value,
values_fill = 0)
To plot a stratigraphic diagram, we can use riojaPlot
and combine pollen and charcoal proxies. For pollen, we select 15 most
frequent TRSH taxa
to be plotted.
PRA_plot_taxa <- taxa(PRA_rec) %>% dplyr::filter(ecologicalgroup %in% c("TRSH")) %>%
dplyr::filter(elementtype == "pollen") %>%
dplyr::arrange(desc(samples)) %>%
head(n = 15)
Now, we calculate a constrained hierarchical cluster analysis using
CONISS
(constrained incremental sum-of-squares cluster
analysis). We further determine the significant zones (clusters) by the
broken stick model (bstick
- 7 splits / 8 zones).
PRA_clust <- chclust(vegdist(PRA_pollen_wide[,-1:-2], "euclidean"), method = "coniss")
plot(PRA_clust, hang = -1)
bstick(PRA_clust)
PRA_zones<-cutree(PRA_clust,7)
We plot the stratigraphic diagram with riojaPlot, and add determined zones.
PRA_plot1 <- riojaPlot(PRA_pollen_wide[,-1:-2]*100, PRA_pollen_wide[,1:2],
selVars = PRA_plot_taxa$variablename,
scale.percent = TRUE,
groups = PRA_eco_g,
plot.cumul = TRUE,
sec.yvar.name="age",
plot.sec.axis = TRUE,
srt.xlabel = 60,
xRight = 0.8)
PRA_plot2 <- addRPClust(PRA_plot1, PRA_clust, xLeft=0.8, xRight = 0.9)
addRPClustZone(PRA_plot2, PRA_clust, nZone=7, xRight = 0.9, col = "red")
PRA_plot3 <- riojaPlot(PRA_charcoal_wide[,-1:-2], PRA_charcoal_wide[,1:2],
minmax = data.frame(0,6),
plot.poly = FALSE,
plot.bar = TRUE,
plot.line = FALSE,
lwd.bar = 2,
col.bar = "black",
xLeft = 0.9,
riojaPlot = PRA_plot2)
To further adapt your stratigraphic plot, please refer to riojaPlot gallery.
First, we plot log-transformed charcoal area data on a time axis.
Then we add different trends: lowess
(locally-weighted
polynomial regression); loess
(fits a polynomial surface
determined by one or more numerical predictors, using local fitting);
smooth.spline
(fits a cubic smoothing spline).
plot(PRA_charcoal_wide$age, log(PRA_charcoal_wide$`Charcoal area`+1), ylim=c(0, 6), xlab = "Age cal BP", ylab = "CHAR")
lines(PRA_charcoal_wide$age, lowess(PRA_charcoal_wide$`Charcoal area`, f=0.1)$y, col="red")
lines(PRA_charcoal_wide$age, loess(PRA_charcoal_wide$`Charcoal area`~PRA_charcoal_wide$age, span = 0.2)$fitted, col="blue")
lines(PRA_charcoal_wide$age, smooth.spline(PRA_charcoal_wide$`Charcoal area`, df = 25)$y, col="green")
legend(1,6, legend = c("LOWESS", "LOESS", "smooth spline"), col = c("red", "blue", "green"), lty=1, cex=0.8)
To estimate the values for given depths/ages in the model, we need to
apply predict()
PRA_char_loess <- loess(PRA_charcoal_wide$`Charcoal area`~PRA_charcoal_wide$age, span = 0.20)
PRA_char_pred <- stats::predict(PRA_char_loess, PRA_pollen_wide$age)
We apply the Principle component analysis (PCA), here
rda()
, a linear unconstrained ordination method. As the
method uses Euclidean distance to calculate dissimilarity between
samples, we need to transform the data first to prevent the double-zero
problem by applying e.g., the Hellinger transformation (square-root)
using decostand
. Next, we plot the ordination using the
biplot
function, adding a fitted environmental variable
(charcoal) with envfit()
.
PRA_pca <- rda(decostand(PRA_pollen_wide[,-1:-2], method = "hellinger"))
PRA_PCA_plot <- biplot(PRA_pca)
plot(envfit(PRA_pca, data.frame(CHAR = PRA_char_pred)))
orditorp(PRA_PCA_plot, display = "species")
Now we fit a smooth surface of charcoal variable to the ordination plot:
PRA_ordisurf <- ordisurf(PRA_pca, PRA_char_pred, knots = 2)
The ordination plots are not ideal, especially with many samples and variables. Therefore, we may play around with different settings to produce a nicer ordination plot. First, we color samples according to their zones, then select a few taxa to display the main gradients. Finally, we plot over the smooth surface plot with charcoal.
tax.sc<-scores(PRA_pca, display="species", choice=1:2)
tax<-c("Pinus", "Betula", "Picea", "Fagus", "Poaceae", "Corylus")
ordiplot(PRA_pca, type = "n")
points(PRA_pca, col = PRA_zones + 6, pch = PRA_zones, lwd = 2)
for (i in tax) {arrows(0,0,tax.sc[i,1], tax.sc[i,2], length = 0.1, angle = 20, code = 2, lwd = 2, col = "red")
text(tax.sc[i,1]+0.05, tax.sc[i,2]+0.05, i , cex = 1.2, col = "red")
}
plot(PRA_ordisurf, add=TRUE)
legend(-1, 0, legend=c("zone 1", "zone 2", "zone 3", "zone 4", "zone 5", "zone 6", "zone 7"),
pch = c(1,2,3,4,5,6,7), pt.cex = 1.2, pt.lwd=2, bty = "n", col=c(7,8,9,10,11,12,13))
Here we perform the Detrended correspondence analysis (DCA) with
decorana
. Correspondence analysis is a weighted averaging
method. DCA detrends the first ordination axis by segments (hammer
effect) often used to determine the extent of compositional turnover (4
SD units = complete turnover) and decide whether the species response
are unimodal or linear. DCA also attempts to remove the arch effect from
ordination (compare the PCA and DCA ordination plots).
Check the gradient length - look at the axis length of DCA1
PRA_dca <- decorana(decostand(PRA_pollen_wide[,-1:-2], method = "hellinger"))
PRA_dca
##
## Call:
## decorana(veg = decostand(PRA_pollen_wide[, -1:-2], method = "hellinger"))
##
## Detrended correspondence analysis with 26 segments.
## Rescaling of axes with 4 iterations.
## Total inertia (scaled Chi-square): 1.159
##
## DCA1 DCA2 DCA3 DCA4
## Eigenvalues 0.1771 0.10365 0.05192 0.06781
## Additive Eigenvalues 0.1771 0.09761 0.04579 0.02194
## Decorana values 0.1850 0.08166 0.03437 0.02454
## Axis lengths 1.6670 1.23956 0.91720 1.17229
plot(PRA_dca, display = "sites")
We will use two additional packages, allowing faster calculation and better results plotting.
library(corrplot)
library(Hmisc)
First, we merge the data and log transform charcoal. Then we
calculate the correlation matrix using rcorr. Then we plot the
significant results (p < 0.01).
PRA_cor_data <- merge(PRA_pollen_wide, PRA_charcoal_wide)
PRA_cor_data$`Charcoal area` <- log(PRA_cor_data$`Charcoal area` + 1)
PRA_cor <- rcorr(as.matrix(PRA_cor_data[,-1:-2]), type = "pearson")
cor.data <- as.matrix(PRA_cor$r[89,which(PRA_cor$P[89,-89]<0.05)])
colnames(cor.data) <- "Charcoal"
corrplot(t(cor.data))