library(neotoma2)
library(riojaPlot)
library(rioja)
library(vegan)
library(ggplot2)
library(ggvegan)
library(ggrepel)
library(dplyr)COST PalaeOpen - Aquatic proxies workshop
neotoma2 R exercise
Introduction
We will introduce you to the Neotoma Paleoecology Database environment, including explorer, Tilia and the neotoma2 R package. The database uses an application programming interface (API), which serves the explorer as well as Tilia but also lets you query the database in R. We will show you which information you can most easily extract in either of these choices and provide a hands-on R exercise on working with diatom, chironomid, and charcoal data.
The neotoma2package is available from CRAN and can be installed in R using install.package(neotoma2). We also load other packages needed for the exercise.
Get all sites with diatom data in Europe and plot on map
Sites in neotoma2 are primarily spatial objects. They have names, locations, and are found within the context of geopolitical units, but within the API and the package, the site itself does not have associated information about taxa, dataset types or ages. It is simply the container into which we add that information.
We can use several parameters to search sites, one of them is datasettype. Here we search all sites containing “diatom” dataset type in Europe.
europe <- list(geoJSON = '{"type": "Polygon",
"coordinates": [[
[-30, 30],
[70, 30],
[70, 90],
[-30, 90],
[-30, 30]]]}')
sites_diat <- get_sites(datasettype = "diatom", loc = europe$geoJSON, all_data = TRUE)
neotoma2::summary(head(sites_diat, 20)) siteid sitename collectionunit chronologies datasets types
1 3377 Řežabinec REZABIN 0 1 diatom
2 3522 Zirbenwaldmoor ZIRBEN1 0 1 diatom
3 14267 Didachara DIDA-C 0 1 diatom
4 15702 Lago Grande di Avigliana AVG0702 0 1 diatom
5 26991 Prášilské jezero PRA-15 0 1 diatom
6 29340 Hallwilersee HALL91-D 0 1 diatom
7 30451 Horseshoe Point 204 HOW0204 0 1 diatom
8 30452 Horseshoe Point 206 HOW0206 0 1 diatom
9 30453 Horseshoe Point 207 HOW0207 0 1 diatom
10 30454 Horseshoe Point 208 HOW0208 0 1 diatom
11 30614 VC35 (BP080094) VC35DI 0 1 diatom
12 30983 Sizewell SZC_VC07 0 1 diatom
13 30983 Sizewell SZC_VC08 0 1 diatom
14 30983 Sizewell SZC_VC16 0 1 diatom
15 30983 Sizewell SZC_VC21 0 1 diatom
Now we get datasets to the sites, and filter charcoal dataset types and plot on the map.
datasets_diat <- sites_diat %>% get_datasets(all_data = TRUE)
neotoma2::plotLeaflet(datasets_diat)Using a single site
Downloading diatom, chironomid and charcoal data
As an example, we will use the site ‘Prášilské jezero’, a mountain lake located in Czechia (Šumava Mts.). This site has been analyzed for several proxies, including diatoms, chironomids and macrocharcoal.
First, we obtain the siteid and other metadata including its datasets for the chosen site and plot its position:
site_ds <- get_datasets(get_sites(sitename = "Prášilské jezero"), all_data = TRUE)
plotLeaflet(site_ds)We can list all available datasets for the site:
datasets(site_ds) datasetid database datasettype age_range_old
47517 European Pollen Database geochronologic NA
47518 European Pollen Database plant macrofossil 11985.0
47519 European Pollen Database charcoal 11954.0
47520 European Pollen Database pollen 11954.0
55585 European Pollen Database diatom 11426.0
55586 European Pollen Database chironomid 11451.0
47521 European Pollen Database geochronologic NA
47522 European Pollen Database plant macrofossil 10641.1
47523 European Pollen Database geochronologic NA
47524 European Pollen Database plant macrofossil 9679.1
age_range_young notes
NA <NA>
-58.0 <NA>
-58.0 <NA>
-58.0 <NA>
-60.0 <NA>
-60.0 <NA>
NA <NA>
-60.5 <NA>
NA <NA>
-58.4 <NA>
Filter out the datasets ‘diatom’, ‘chironomid’ and ‘charcoal’
site_ds <- site_ds %>% neotoma2::filter(datasettype %in% c("diatom", "chironomid", "charcoal"))And pull in and extract the sample data
site_rec <- site_ds %>% get_downloads()
site_samp <- site_rec %>% samples()Filter samples
Now we will filter only charcoal samples with values as area (function filter of dplyr package). We also select only relevant columns, and merge variablename and elementtype.
site_charcoal <- samples(site_rec) %>% dplyr::filter(ecologicalgroup == "CHAR" & elementtype == "area") %>%
dplyr::select(depth, age, variablename, elementtype, units, value) %>%
dplyr::mutate(variable = paste(variablename, elementtype, sep = " "), .keep = "unused") %>%
dplyr::group_by(depth, age) %>%
dplyr::arrange(depth)In the next step, we pull in diatom samples.
site_samp_diat <- samples(site_rec) %>%
dplyr::filter(taxongroup == "Diatoms")Data transformation
Finally, we need to transform the diatom frustule counts to proportions.
site_diat_perc <- site_samp_diat %>%
dplyr::group_by(depth, age) %>%
dplyr::mutate(diatcount = sum(value, na.rm = TRUE)) %>%
dplyr::group_by(variablename) %>%
dplyr::mutate(prop = value / diatcount) %>%
dplyr::arrange(desc(age))For further use in other packages, it is useful to pivot the data to a “wide” table, with variablenames as column headings. We do this for both diatom and charcoal datasets using tidyr (note that charcoal is in concentration units).
site_diat_perc <- site_diat_perc %>%
dplyr::select(depth, age, variablename, prop) %>%
dplyr::mutate(prop = as.numeric(prop))
site_diat_wide <- tidyr::pivot_wider(site_diat_perc,
id_cols = c(depth, age),
names_from = variablename,
values_from = prop,
values_fill = 0)
site_charcoal_wide <- tidyr::pivot_wider(site_charcoal,
id_cols = c(depth, age),
names_from = variable,
values_from = value,
values_fill = 0)Stratigraphic plotting
To plot a stratigraphic diagram, we can use riojaPlot. We select 15 most frequent diatom taxa to be plotted.
site_plot_taxa <- taxa(site_rec) %>% dplyr::filter(taxongroup == "Diatoms") %>%
dplyr::arrange(desc(samples)) %>%
head(n = 15)Zonation
Now, we calculate a constrained hierarchical cluster analysis using CONISS (constrained incremental sum-of-squares cluster analysis) using the chord distance. We further determine the significant zones (clusters) by the broken stick model (4 splits / 5 zones).
site_clust <- chclust(vegdist(site_diat_wide[,-1:-2], "chord"), method = "coniss")
plot(site_clust, hang = -1)bstick(site_clust)We plot the stratigraphic diagram with riojaPlot, and add significant zones.
riojaPlot(site_diat_wide[,-1:-2]*100, site_diat_wide[,1:2],
selVars = site_plot_taxa$variablename,
scale.percent = TRUE,
sec.yvar.name="age",
plot.sec.axis = TRUE,
srt.xlabel = 60,
xRight = 0.85) |>
addRPClust(site_clust) |>
addRPClustZone(site_clust, col = "red")To further adapt your stratigraphic plot, please refer to riojaPlot gallery.
Comparing the trend in charcoal to variability in diatom composition
Show the trend in charcoal deposition
First, we plot log-transformed charcoal area data on a time axis. Then we add different trends: lowess (locally-weighted polynomial regression); loess (fits a polynomial surface determined by one or more numerical predictors, using local fitting); smooth.spline (fits a cubic smoothing spline).
plot(site_charcoal_wide$age, log(site_charcoal_wide$`Charcoal area`+1), ylim=c(0, 6), xlab = "Age cal BP", ylab = "CHAR")
lines(site_charcoal_wide$age, lowess(site_charcoal_wide$`Charcoal area`, f=0.1)$y, col="red")
lines(site_charcoal_wide$age, loess(site_charcoal_wide$`Charcoal area`~site_charcoal_wide$age, span = 0.2)$fitted, col="blue")
lines(site_charcoal_wide$age, smooth.spline(site_charcoal_wide$`Charcoal area`, df = 25)$y, col="green")
legend(1,6, legend = c("LOWESS", "LOESS", "smooth spline"), col = c("red", "blue", "green"), lty=1, cex=0.8)To estimate the values for given depths/ages in the model, we need to apply predict(). We further use loess interpolation.
site_char_loess <- loess(site_charcoal_wide$`Charcoal area`~site_charcoal_wide$depth, span = 0.20)
site_char_pred <- stats::predict(site_char_loess, site_diat_wide$depth)Multivariate analysis (ordination)
Principle component analysis
We apply the Principle component analysis (PCA), here rda(), a linear unconstrained ordination method. As the method uses Euclidean distance to calculate dissimilarity between samples, we need to transform the data first to prevent the double-zero problem by applying e.g., the Hellinger transformation (square-root) using decostand. We also extract the PCA scores with fortify(), and use only 15 most abundant species. Next, we fit charcoal as environmental variable to the PCA analysis using envfit().
site_pca <- rda(decostand(site_diat_wide[,-1:-2], method = "hellinger"))
site_PCA_species <- site_diat_wide[,-1:-2] %>%
select(order(colSums(.), decreasing = T))
site_scores <- fortify(site_pca)
site_scores_species <- subset(site_scores, score == "species" & label %in% head(colnames(site_PCA_species), 15))
site_PCA_env <- envfit(site_pca, data.frame(CHAR = site_char_pred))
site_PCA_env_scores <- fortify(site_PCA_env)Next, we plot the ordination using the ggplot function, adding a fitted environmental variable (charcoal) with envfit().
ggplot() +
geom_point(data = site_scores_species, aes(x=PC1, y=PC2), color = "#117733", size = 2) +
geom_text_repel(data = site_scores_species, aes(x = PC1, y = PC2, label = label), color = "#117733", size = 5) +
geom_segment(data = site_PCA_env_scores, aes(x = 0, y = 0, xend = PC1, yend = PC2),
arrow = arrow(length = unit(0.2, "cm")), color = "#D55E00") +
geom_text_repel(data = site_PCA_env_scores, aes(x = PC1, y = PC2, label = label), color = "#D55E00", size = 5) +
geom_vline(xintercept = 0, linetype = "dashed", color = "darkgray") +
geom_hline(yintercept = 0, linetype = "dashed", color = "darkgray") +
theme_classic() +
theme(
panel.border = element_rect(color = "black", fill = NA, linewidth = 1),
axis.line = element_blank(),
axis.text = element_text(size = 10)
) +
labs(x = "PC1", y = "PC2")Detrended correspondence analysis
Here we perform the Detrended correspondence analysis (DCA) with decorana. Correspondence analysis is a weighted averaging method. DCA detrends the first ordination axis by segments (hammer effect) often used to determine the extent of compositional turnover (4 SD units = complete turnover) and decide whether the species response are unimodal or linear. DCA also attempts to remove the arch effect from ordination (compare the PCA and DCA ordination plots).
Check the gradient length - look at the axis length of DCA1
site_dca <- decorana(decostand(site_diat_wide[,-1:-2], method = "hellinger"))
site_dca
Call:
decorana(veg = decostand(site_diat_wide[, -1:-2], method = "hellinger"))
Detrended correspondence analysis with 26 segments.
Rescaling of axes with 4 iterations.
Total inertia (scaled Chi-square): 2.0524
DCA1 DCA2 DCA3 DCA4
Eigenvalues 0.2985 0.1344 0.09716 0.07879
Additive Eigenvalues 0.2985 0.1343 0.09604 0.07763
Decorana values 0.3067 0.1425 0.11246 0.06256
Axis lengths 1.6684 1.9264 1.78166 1.31981
plot(site_dca, display = "sites")