Introduction

We will introduce you to the Neotoma Paleoecology Database environment, including explorer, Tilia and the neotoma2 R package. The database uses an application programming interface (API), which serves the explorer as well as Tilia but also lets you query the database in R. We will show you which information you can most easily extract in either of these choices and provide a hands-on R exercise on working with charcoal and pollen data.

The neotoma2package is available from GitHub and can be installed in R using the devtools package. We also load other packages.

devtools::install_github('NeotomaDB/neotoma2')
library(neotoma2)
library(riojaPlot)
library(rioja)
library(vegan)

Get all sites

Sites in neotoma2 are primarily spatial objects. They have names, locations, and are found within the context of geopolitical units, but within the API and the package, the site itself does not have associated information about taxa, dataset types or ages. It is simply the container into which we add that information.

We can use several parameters to search sites, one of them is taxa. Here we search all sites containing “Charcoal” taxon and its variations.

char_sites <- get_sites(taxa = 'Charcoal%', all_data = TRUE)
neotoma2::summary(head(char_sites, 10))
##    siteid            sitename collectionunit chronolgies datasets         types
## 1       8 Abalone Rocks Marsh        ABALONE           0        1      charcoal
## 2     196  Hells Kitchen Lake       HELLSK05           0        1        pollen
## 3     214          Arts Lough       ARTSLOUG           0        1        pollen
## 4     273      Blacktail Pond          BTP06           0        1        pollen
## 5     459      Wentworth Lake           WENT           0        1 macrocharcoal
## 6     519      Cub Creek Pond       CUBCRK17           0        1        pollen
## 7     525         Cupola Pond        CUPOLA4           0        1 macrocharcoal
## 8    2303         Silver Lake       SILVER07           0        1      charcoal
## 9    2520   Stotzel-Leis Site         STOT14           0        1      charcoal
## 10   2570         Lake Tulane        TULANEG           0        1      charcoal

Now we get datasets to the sites, and filter charcoal dataset types and plot on the map.

char_ds <- char_sites %>% get_datasets(all_data = TRUE)
table(as.data.frame(datasets(char_ds))$datasettype)
## 
##                charcoal charcoal surface sample         dinoflagellates 
##                     228                      94                       1 
##           macrocharcoal           microcharcoal       plant macrofossil 
##                      28                      26                      30 
##                  pollen 
##                     414
char_mc <- char_ds %>% neotoma2::filter(datasettype %in% c("macrocharcoal", "charcoal", "microcharcoal"))
neotoma2::plotLeaflet(char_mc)

Using single site

Downloading charcoal and pollen data

As an example, we will use the site ‘Prášilské jezero’, a mountain lake located in Czechia (Šumava Mts.). This site has been analyzed for several proxies, including pollen and macrocharcoal.

First, we obtain the siteid and other metadata including its datasets for the chosen site and plot its position:

PRA_ds <- get_datasets(get_sites(sitename = "Prášilské jezero"), all_data = TRUE)
plotLeaflet(PRA_ds)

Filter out the datasets ‘pollen’ and ‘charcoal’

PRA_pol_ch <- PRA_ds %>% neotoma2::filter(datasettype %in% c("pollen", "charcoal"))

And pull in and extract the sample data

PRA_rec <- PRA_pol_ch %>% get_downloads()
PRA_samp <- PRA_rec %>% samples()

Filter samples

Now we will filter only charcoal samples with values as area (function filter of dplyr package). We also select only relevant columns, and merge variablename and elementtype

PRA_charcoal <- samples(PRA_rec) %>% dplyr::filter(ecologicalgroup == "CHAR" & elementtype == "area") %>%
  dplyr::select(depth, age, variablename, elementtype, value) %>%
  dplyr::mutate(variable = paste(variablename, elementtype, sep = " "), .keep = "unused") %>%
  dplyr::group_by(depth, age) %>%
  dplyr::arrange(depth)

In the next step, we pull in pollen samples that are measured using NISP (number of identified specimen), only pollen element (to avoid e.g., stomata). Finally, we filter taxa to calculate pollen percentages for ecological groups of trees and shrubs (TRSH) and upland herbs (UPHE). We also produce a taxa table with ecological groups for plotting.

PRA_samp_short <- samples(PRA_rec) %>% 
  dplyr::filter(ecologicalgroup %in% c("TRSH","UPHE")) %>%
  dplyr::filter(elementtype == "pollen") %>%
  dplyr::filter(units == "NISP")
PRA_eco_g <- PRA_samp_short %>% dplyr::select(variablename, ecologicalgroup) %>% dplyr::distinct()

Data transformation

Finally, we need to transform the values to proportions using all TRSH and HERB taxa.

PRA_pollen_perc <- PRA_samp_short %>%
  dplyr::group_by(depth, age) %>%
  dplyr::mutate(pollencount = sum(value, na.rm = TRUE)) %>%
  dplyr::group_by(variablename) %>% 
  dplyr::mutate(prop = value / pollencount) %>% 
  dplyr::arrange(desc(age))

For further use in other packages, it is useful to pivot the data to a “wide” table, with variablenames as column headings. We do this for both pollen and charcoal datasets using tidyr.

PRA_pollen_perc <- PRA_pollen_perc %>%
  dplyr::select(depth, age, variablename, prop) %>% 
  dplyr::mutate(prop = as.numeric(prop))

PRA_pollen_wide <- tidyr::pivot_wider(PRA_pollen_perc,
                             id_cols = c(depth, age),
                             names_from = variablename,
                             values_from = prop,
                             values_fill = 0)

PRA_charcoal_wide <- tidyr::pivot_wider(PRA_charcoal,
                                        id_cols = c(depth, age),
                                        names_from = variable,
                                        values_from = value,
                                        values_fill = 0)

Stratigraphic plotting

To plot a stratigraphic diagram, we can use riojaPlot and combine pollen and charcoal proxies. For pollen, we select 15 most frequent TRSH taxa to be plotted.

PRA_plot_taxa <- taxa(PRA_rec) %>% dplyr::filter(ecologicalgroup %in% c("TRSH")) %>%
  dplyr::filter(elementtype == "pollen") %>%
  dplyr::arrange(desc(samples)) %>% 
  head(n = 15) 

Zonation

Now, we calculate a constrained hierarchical cluster analysis using CONISS (constrained incremental sum-of-squares cluster analysis). We further determine the significant zones (clusters) by the broken stick model (bstick - 7 splits / 8 zones).

PRA_clust <- chclust(vegdist(PRA_pollen_wide[,-1:-2], "euclidean"), method = "coniss")
plot(PRA_clust, hang = -1)

bstick(PRA_clust)

PRA_zones<-cutree(PRA_clust,7)

We plot the stratigraphic diagram with riojaPlot, and add determined zones.

PRA_plot1 <- riojaPlot(PRA_pollen_wide[,-1:-2]*100, PRA_pollen_wide[,1:2],
          selVars = PRA_plot_taxa$variablename,
          scale.percent = TRUE,
          groups = PRA_eco_g,
          plot.cumul = TRUE,
          sec.yvar.name="age",
          plot.sec.axis = TRUE,
          srt.xlabel = 60,
          xRight = 0.8)
PRA_plot2 <- addRPClust(PRA_plot1, PRA_clust, xLeft=0.8, xRight = 0.9)
addRPClustZone(PRA_plot2, PRA_clust, nZone=7, xRight = 0.9, col = "red")
PRA_plot3 <- riojaPlot(PRA_charcoal_wide[,-1:-2], PRA_charcoal_wide[,1:2], 
          minmax = data.frame(0,6),
          plot.poly = FALSE,
          plot.bar = TRUE,
          plot.line = FALSE,
          lwd.bar = 2,
          col.bar = "black",
          xLeft = 0.9,
          riojaPlot = PRA_plot2)

To further adapt your stratigraphic plot, please refer to riojaPlot gallery.

Comparing the trend in charcoal to variability in pollen composition

Show the trend in charcoal deposition

First, we plot log-transformed charcoal area data on a time axis. Then we add different trends: lowess (locally-weighted polynomial regression); loess (fits a polynomial surface determined by one or more numerical predictors, using local fitting); smooth.spline (fits a cubic smoothing spline).

plot(PRA_charcoal_wide$age, log(PRA_charcoal_wide$`Charcoal area`+1), ylim=c(0, 6), xlab = "Age cal BP", ylab = "CHAR")
lines(PRA_charcoal_wide$age, lowess(PRA_charcoal_wide$`Charcoal area`, f=0.1)$y, col="red")
lines(PRA_charcoal_wide$age, loess(PRA_charcoal_wide$`Charcoal area`~PRA_charcoal_wide$age, span = 0.2)$fitted, col="blue")
lines(PRA_charcoal_wide$age, smooth.spline(PRA_charcoal_wide$`Charcoal area`, df = 25)$y, col="green")
legend(1,6, legend = c("LOWESS", "LOESS", "smooth spline"), col = c("red", "blue", "green"), lty=1, cex=0.8)

To estimate the values for given depths/ages in the model, we need to apply predict()

PRA_char_loess <- loess(PRA_charcoal_wide$`Charcoal area`~PRA_charcoal_wide$age, span = 0.20)
PRA_char_pred <- stats::predict(PRA_char_loess, PRA_pollen_wide$age)

Multivariate analysis (ordination)

Principle component analysis

We apply the Principle component analysis (PCA), here rda(), a linear unconstrained ordination method. As the method uses Euclidean distance to calculate dissimilarity between samples, we need to transform the data first to prevent the double-zero problem by applying e.g., the Hellinger transformation (square-root) using decostand. Next, we plot the ordination using the biplot function, adding a fitted environmental variable (charcoal) with envfit().

PRA_pca <- rda(decostand(PRA_pollen_wide[,-1:-2], method = "hellinger"))
PRA_PCA_plot <- biplot(PRA_pca)
plot(envfit(PRA_pca, data.frame(CHAR = PRA_char_pred)))
orditorp(PRA_PCA_plot, display = "species")

Now we fit a smooth surface of charcoal variable to the ordination plot:

PRA_ordisurf <- ordisurf(PRA_pca, PRA_char_pred, knots = 2)

The ordination plots are not ideal, especially with many samples and variables. Therefore, we may play around with different settings to produce a nicer ordination plot. First, we color samples according to their zones, then select a few taxa to display the main gradients. Finally, we plot over the smooth surface plot with charcoal.

tax.sc<-scores(PRA_pca, display="species", choice=1:2)
tax<-c("Pinus", "Betula", "Picea", "Fagus", "Poaceae", "Corylus")

ordiplot(PRA_pca, type = "n")
points(PRA_pca, col = PRA_zones + 6, pch = PRA_zones, lwd = 2)
for (i in tax) {arrows(0,0,tax.sc[i,1], tax.sc[i,2], length = 0.1, angle = 20, code = 2, lwd = 2, col = "red")
  text(tax.sc[i,1]+0.05, tax.sc[i,2]+0.05, i , cex = 1.2, col = "red")
  }
plot(PRA_ordisurf, add=TRUE)
legend(-1, 0, legend=c("zone 1", "zone 2", "zone 3", "zone 4", "zone 5", "zone 6", "zone 7"),
       pch = c(1,2,3,4,5,6,7), pt.cex = 1.2, pt.lwd=2, bty = "n", col=c(7,8,9,10,11,12,13))

Detrended correspondence analysis

Here we perform the Detrended correspondence analysis (DCA) with decorana. Correspondence analysis is a weighted averaging method. DCA detrends the first ordination axis by segments (hammer effect) often used to determine the extent of compositional turnover (4 SD units = complete turnover) and decide whether the species response are unimodal or linear. DCA also attempts to remove the arch effect from ordination (compare the PCA and DCA ordination plots).

Check the gradient length - look at the axis length of DCA1

PRA_dca <- decorana(decostand(PRA_pollen_wide[,-1:-2], method = "hellinger"))
PRA_dca
## 
## Call:
## decorana(veg = decostand(PRA_pollen_wide[, -1:-2], method = "hellinger")) 
## 
## Detrended correspondence analysis with 26 segments.
## Rescaling of axes with 4 iterations.
## Total inertia (scaled Chi-square): 1.159 
## 
##                        DCA1    DCA2    DCA3    DCA4
## Eigenvalues          0.1771 0.10365 0.05192 0.06781
## Additive Eigenvalues 0.1771 0.09761 0.04579 0.02194
## Decorana values      0.1850 0.08166 0.03437 0.02454
## Axis lengths         1.6670 1.23956 0.91720 1.17229
plot(PRA_dca, display = "sites")

Check correlation between charcoal and pollen taxa

We will use two additional packages, allowing faster calculation and better results plotting.

library(corrplot)
library(Hmisc)

First, we merge the data and log transform charcoal. Then we calculate the correlation matrix using rcorr. Then we plot the significant results (p < 0.01).

PRA_cor_data <- merge(PRA_pollen_wide, PRA_charcoal_wide)
PRA_cor_data$`Charcoal area` <- log(PRA_cor_data$`Charcoal area` + 1)
PRA_cor <- rcorr(as.matrix(PRA_cor_data[,-1:-2]), type = "pearson")

cor.data <- as.matrix(PRA_cor$r[89,which(PRA_cor$P[89,-89]<0.05)])
colnames(cor.data) <- "Charcoal"

corrplot(t(cor.data))