Author

Paul Lincoln

This document details the methods used to run standard analyses for multivariate XRF data (e.g. central log-ratios, PCAs, clustering) in RStudio for the ITRAX XRF from Nautajarvi in Finland. The document is split into three sections. First, the code used format and analyse the XRF data are presented in Section 1. Code used to plot results are shown in Section 2 and supplementary data and figures are presented in Section 3.

1 Statistical analyses

First, clear the work environment to remove residual objects from any previous analyses.

rm(list = ls(all.names=T))

1.1 Load required libraries and data

Load in the libraries required to run the analyses

library(pacman)

p_load(tidyverse, compositions, readxl, zoo, corrplot, robCompositions, car, factoextra,janitor, ggrepel,dendextend,jpeg, gttable, NbClust, gt, ComplexHeatmap, grid)

The following code reads in the raw XRF data into an R object called ‘XRF’, replicate variances into an object called ‘variance’, the Nautajarvi varve thicknesses from Ojala and Alenius (2005) into an object called ‘VT’, and the maximum varve depth (mm) in the composite stratigraphy into an object called ‘max_depth’.

Show code
# Set the file directory for the RData files
RDATdir <- '/Users/paullincoln/Dropbox/2024/Papers/Nautajarvi second draft/Third draft/R_Figures/RData_files/'
#Set a figure directory to save output plots
figdir <- '/Users/paullincoln/Dropbox/2024/Papers/Nautajarvi second draft/Third draft/R_Figures'


#Load in XRF, VT, variance and max_depth objects
load(file = paste0(RDATdir,'XRF_Varve_data.RData', sep = ''))

1.2 Transform and log the data for further analyses

Raw XRF measurements are affected by variability in the physical properties of the host sediment & compositional constant-sum constraints. This means that element intensities do not linearly represent concentration. To account for this, values are log ratioed and centre-log ratioed (CLR).

First, we calculate the normal log ratio between elements of interest. Here, the log function is used to calculate log ratios. The values are then written to the XRF data frame.

Show code
#add in log ratios, calculated prior to CLR transformation
XRF$`ln(inc/coh)` <- log(XRF$`Rh inc`/XRF$`Rh coh`)
XRF$`ln(Fe/Mn)` <- log(XRF$Fe/XRF$Mn)
XRF$`ln(Fe/Ti)` <- log(XRF$Fe/XRF$Ti)
XRF$`ln(Mn/Ti)` <- log(XRF$Mn/XRF$Ti)

We can now central log ratio our data. Note that the column numbers here are specific to the Nautajarvi data frame file. These will change for other datasets so make sure that they are altered correctly to cover just the raw element data

Show code
#re-arrange data frame columns, the number reflects the column number from the left.
XRF <-
  XRF[c(
    'DepthID [mm]',
    'Age (calBP)',
    'Age (yrCE)',
    '1% MCE',
    'Ca',
    'Fe',
    'K',
    'Mn',
    'S',
    'Si',
    'Ti',
    'Rh coh',
    'Rh inc',
    'ln(Fe/Ti)',
    'ln(Mn/Ti)',
    'ln(inc/coh)'
  )]

#clr XRF data.
CLR <- cenLR(XRF[c(5:13)])
CLR <- data.frame(CLR$x.clr)
#write the CLR transformed data back into the XRF data frame
XRF[c(5:13)] <- CLR[c(1:9)]

#Preserve total dataset for plotting in Figure 2
XRF_total <- XRF
#filter data to cover only varved section of the NAU-23 sequence (max_depth object notes maximum depth of marker horizons used to transfer varve chronology to NAU-23)
XRF <- XRF %>% filter(`DepthID [mm]` <= max_depth)

1.3 Scale for variance

Prior to multivariate statistical analyses, the data are scaled according to the replicate variance. This is done to statistically downweight elements with low analytical precision. This follows the methodology adopted in the Xelerate software for XRF analyses (Weltje et al. (2015)).

Show code
# Scale data for replicate variance to account for variable precision.
#re-order variance columns to match XRF
colorder<-colnames(XRF)[5:13]

#Remove Cu from variance df.
variance <- subset(variance, select = -Cu)

variance <- variance[, c(colorder, setdiff(names(variance), colorder))]

#Normalize the weights so that sum equals one
normalized_variance <- 1-(variance / sum(variance)) 
# Multiply the variable variance values to scale each element relative to analytical precision. Note only the clr transformed raw elements are included here
scaled_data <- as.data.frame(mapply(`*`, XRF[c(5:13)], normalized_variance))
#Create a matrix with selected elements to run clustering algorithm
clust_XRF<- as.matrix(scaled_data[c(1:7)])

1.4 Ward’s Hierarchical clustering.

The number of clusters is pre-selected here, but is guided by assessment of the dendrogram, sedimentological core descriptions and the results of NBClust (Section 3.0.2).

Show code
#ward's hierarchical clustering
k <- 4 #choose number of clusters. 
clust_col <- c('steelblue','forestgreen','firebrick1','yellow2')
clust_col_num <- c('1','2','3', '4')


set.seed(123)
dend <- scale(clust_XRF) %>%
  dist('euclidean') %>%
  hclust(method= 'ward.D') %>%
  as.dendrogram %>%
  dendextend::set("branches_k_color", k = k, value = clust_col)
# Plot dendrogram with colored branches and no labels


# Extract labels/sample and corresponding branches
labels <- labels(dend)
branch_colors <- dend %>% get_leaves_branches_col


# Create a data frame containing labels assigned to row numbers
c <- data.frame(
  label = as.numeric(labels),
  branch_color = branch_colors
) 
# Order by label value/ row
c <- c[order(c$label), ]
#mutate cluster colours back to numbers for plotting
c <- c %>% mutate(branch_color = dplyr::case_when(branch_color ==  clust_col[1] ~ '1',
                                                  branch_color ==  clust_col[2] ~ '2',
                                                  branch_color ==  clust_col[3] ~ '3', 
                                                  branch_color ==  clust_col[4] ~ '4'))
#write back into XRF and scaled_data files
scaled_data$cluster <- c$branch_color
XRF$cluster <- c$branch_color

# Count occurrences of 4 in a 100-row rolling window
XRF <- XRF %>%
  dplyr::mutate(cluster_4 = zoo::rollapply(cluster, width = 100, FUN = function(x) sum(x == 4), align = "right", fill = NA),
                cluster_3 = zoo::rollapply(cluster, width = 100, FUN = function(x) sum(x == 3), align = "right", fill = NA),
                cluster_2 = zoo::rollapply(cluster, width = 100, FUN = function(x) sum(x == 2), align = "right", fill = NA),
                cluster_1 = zoo::rollapply(cluster, width = 100, FUN = function(x) sum(x == 1), align = "right", fill = NA))

1.5 Principal components analysis (PCA)

Show code
#Perform the PCA using the princomp function on the scaled data and write output to a new object. Note that the clr-transformed data scaled for replicate variance are used here, so secondary scaling is not applied
data.pca <- princomp(scaled_data[c('Si','S','K','Ca','Ti','Mn','Fe')])

summary(data.pca)
Importance of components:
                         Comp.1    Comp.2     Comp.3     Comp.4     Comp.5
Standard deviation     0.710104 0.3062083 0.17439004 0.15218678 0.08735206
Proportion of Variance 0.758775 0.1410922 0.04576278 0.03485161 0.01148194
Cumulative Proportion  0.758775 0.8998671 0.94562991 0.98048153 0.99196347
                            Comp.6     Comp.7
Standard deviation     0.067605426 0.02775294
Proportion of Variance 0.006877524 0.00115901
Cumulative Proportion  0.998840990 1.00000000
Show code
#Assign depositional stages to subset data for PCAs
pcaval<-as.data.frame(data.pca$scores)
XRF$PC1 <- pcaval$Comp.1
XRF$PC2 <- pcaval$Comp.2

rm(pcaval)

2 Figures

This section presents the code used to plot the figures in the manuscript and supplementary information.

2.1 Format data for plotting

Show code
#set the figure output directory. Code will export pdfs of all plots into this folder.
setwd(figdir)

#load in meterological data from Halli weather station for the time series plots in Figure 1.
load(paste0(RDATdir,'Fig1_Halli_Data.RData'))

#load in core photo images for the plots in Figures 3-4. This object contains three files. G1_chronology is the varve chronology for the surface gravity core taken from the NAU-23 stratigraphy. G1_image is the corresponding core photo for the gravity core. Core_image is a JPEG of the core section used in Figure 3.  
load(paste0(RDATdir, 'Image_plots.RData'))

#Load in external data for discussion figures. 
load(paste0(RDATdir,'Discussion_Figs_External_Data.RData'))

#set uniform figure colours
clust_col <- clust_col
PCA_col <- c('orange2', 'purple3') #PCA colours

#add new column to list phases
XRF <- XRF %>%
  mutate(
    Phase = case_when(
      `Age (calBP)` > 7100 & `Age (calBP)` <= 9900 ~ "Phase 1",
      `Age (calBP)` >= 5000 & `Age (calBP)` <= 7100 ~ "Phase 2",
      `Age (calBP)` >= -80 & `Age (calBP)` <= 5000 ~ "Phase 3",
      TRUE ~ "Other" # In case the age does not fall into any of the defined phases
    )
  )

#transform XRF data to mean annual resolution via linear interpolation for discussion plots
XRFannual <- XRF[c(2,5:13,22,23)]

XRFannual$`Varve (yr BP)` <- trunc(XRFannual$`Age (calBP)`)
nam<-colnames(XRFannual[c(1,2:12)]) 
XRFannual <- XRFannual[c(13,2:12)] %>% group_by(`Varve (yr BP)`) %>%
  summarise(across(everything(), list(mean)))
#XRFannual[9898,1] <-XRFannual[9890,1]+1 #bottom year deleted so need to add it back in
colnames(XRFannual) <- nam

2.2 Figure 1: Halli station data

Show code
WIND_Met <- WIND_Met %>% mutate(`Mean wind speed anomaly (m/s)` = `Monthly mean wind speed (m/s)`- mean(`Monthly mean wind speed (m/s)`),
                                Color = ifelse(`Mean wind speed anomaly (m/s)` >0 , 'red', 'blue'))

Te<-ggplot(TEMP_Met, aes(x=Month+0.5)) + 
  geom_rect(aes(xmin = 0.5, xmax = 5.1, ymin = -Inf, ymax = Inf), fill =  'grey80', color = 'black', linetype = "dashed", alpha = 0.25) +
  geom_rect(aes(xmin = 12, xmax = 12.9, ymin = -Inf, ymax = Inf), fill = 'grey80', color = 'black', linetype = "dashed", alpha = 0.25)+
  geom_ribbon(aes(ymin = `T average min monthly`, ymax = `T average max monthly`), fill = 'red', alpha = 0.25)+
  geom_path(aes(y= `T average (°C)`)) +
  geom_hline(yintercept = TEMP_Met$`T average (°C)`[13], color = 'red', linetype = 'dashed')+
  ggpubr::theme_pubr() + scale_x_continuous(limits = c(0.5,12.9), breaks = seq(1,12,1), expand = c(0,0)) +
  scale_y_continuous(limits = c(-12,23), breaks = seq(-20,30, 2), expand = c(0,0)) + labs(x = 'Month' , y= expression(Temperature~(degree*C)))


P<-ggplot(PREC_Met, aes(x=Month+0.5)) + 
  geom_rect(aes(xmin = 0.5, xmax = 5.1, ymin = -Inf, ymax = Inf), fill =  'grey80', color = 'black', linetype = "dashed", alpha = 0.25) +
  geom_rect(aes(xmin = 12, xmax = 12.9, ymin = -Inf, ymax = Inf), fill = 'grey80', color = 'black', linetype = "dashed", alpha = 0.25)+
  # geom_crossbar(aes(ymin = 0, y = `Max Prec during day (mm)`, ymax = `Max Prec during day (mm)`), fill = 'navy', width = 0.5, alpha = 0.5)+
  geom_ribbon(aes(ymin = `Min (mm)`, ymax = `Max (mm)`), fill = 'blue3', alpha = 0.5)+
  geom_path(aes(y= `Prec average (mm)`)) +
  geom_hline(yintercept = PREC_Met_annual$`Prec average (mm)`/12, color = 'navy', linetype = 'dashed')+
  ggpubr::theme_pubr() + scale_x_continuous(limits = c(0.5,12.9), breaks = seq(1,12,1), expand = c(0,0)) +
  scale_y_continuous(limits = c(0,180), breaks = seq(0,500, 20), expand = c(0,0)) + labs(x = 'Month' , y= 'Precipitation (mm)')

W <- ggplot(WIND_Met, aes(x = Month+0.5)) +  
  geom_rect(aes(xmin = 0.5, xmax = 5.1, ymin = -Inf, ymax = Inf), fill =  'grey80', color = 'black', linetype = "dashed", alpha = 0.25) +
  geom_rect(aes(xmin = 12, xmax = 12.9, ymin = -Inf, ymax = Inf), fill = 'grey80', color = 'black', linetype = "dashed", alpha = 0.25)+
  geom_crossbar(aes(ymin = 0, ymax =`Mean wind speed anomaly (m/s)`,  y = `Mean wind speed anomaly (m/s)`,  fill = Color),
                width = 0.5) +
  geom_path(aes(y = `Mean wind speed anomaly (m/s)`), color = 'black')+
  geom_hline(yintercept =0, color = 'black') + ggpubr::theme_pubr()+
  scale_fill_identity()+
  scale_x_continuous(limits = c(0.5,12.9), breaks = seq(1,12,1), expand = c(0,0)) + labs(x = 'Month' , y= 'Wind speed anomaly (m/s)')


Figure_1 <- cowplot::plot_grid(Te,P,W, nrow = 1, align = 'hv')


ggsave(filename = file.path(figdir, "Figure_1.pdf"), plot = Figure_1, width = 305.30292, height = 57.37082, units = "mm")

# Render the plot with the correct dimensions
#knitr::opts_chunk$set(fig.width = 305.30292 / 25.4, fig.height = 67.37082 / 25.4) # convert mm to inches
print(Figure_1)

2.3 Figure 2: XRF elements against depth

Show code
#set the working directory to a folder to save the files 

#Ilustrator struggles to plot line data >~30000 units so the data can be reduced here purely for plotting purposes
XRFplot <- XRF_total
XRFplot<-XRF_total %>%
 dplyr::filter(row_number() %% 2 == 1) #optionally reduce row numbers here to aid plotting in Illustrator

#write a plot function
plot_function <- function(element, rollmean, ax_lab){
  
  ggplot(XRFplot) +
    # geom_rug(aes(y= as.numeric(`DepthID*`/10),color = as.factor(cluster))) +
    geom_path(aes(y= as.numeric(`DepthID [mm]`/10), x= rollmean(element, 1, na.pad = T)),color = 'grey88', linewidth = 0.2)+
    geom_path(aes(y= as.numeric(`DepthID [mm]`/10), x= rollmean(element, rollmean, na.pad = T)),linewidth = 0.5) +ggpubr::theme_pubr() + scale_y_reverse(limits = c(724.96,0), breaks = seq(0,1000,50), expand = c(0,0))+
    geom_hline(yintercept = c(674,437.8,308,0), color = 'black') +
    ylab("Depth (cm)") + xlab(bquote(.(ax_lab)[clr])) + theme(text = element_text(),axis.text.x = element_text( angle = 90, hjust = 1),
    )
  
}

#this smooth value acts as a moving average for the time series
smooth <- 50

Varvesannual <- ggplot(VT, aes(y=Depth_Interpolation/10)) + geom_path(aes(x = `Varve thickness (mm)`), size = 0.2, alpha = 0.5) +
  geom_path(aes(x = rollmean(`Varve thickness (mm)`, smooth, na.pad = T))) +
  scale_x_log10(breaks = c(0.1,0.5,1,2.5,5,10)) +ggpubr::theme_pubr()+
  geom_hline(yintercept = c(674,437.8,308,0), color = 'black') +
  scale_y_reverse(limits = c(720, 0), breaks = seq(0,10000, 500), expand = c(0,0)) + theme(legend.position = 'none')


Varvesseasonal <- ggplot(VT, aes(y=Depth_Interpolation/10)) + 
  geom_path(aes(x = `winter thickness`), color = 'blue',size = 0.2, alpha = 0.5) +
  geom_path(aes(x = rollmean(`winter thickness`, smooth, na.pad = T)),color = 'blue') +
  geom_path(aes(x = `summer thickness`), color = 'red',size = 0.2, alpha = 0.5) +
  geom_path(aes(x = rollmean(`summer thickness`, smooth, na.pad = T)),color = 'red') +
  scale_x_log10(breaks = c(0.1,0.5,1,2.5,5,10)) +ggpubr::theme_pubr()+
  geom_hline(yintercept = c(674,437.8,308,0), color = 'black') +
  scale_y_reverse(limits = c(720, 0), breaks = seq(0,10000, 500), expand = c(0,0)) + theme(legend.position = 'none')



xrf_runav <- 25

Ti_plot<-plot_function(XRFplot$Ti, xrf_runav, 'Ti')
S_plot <- plot_function(XRFplot$ `S`, xrf_runav, 'S')
Mn_plot <- plot_function(XRFplot$`Mn`, xrf_runav, 'Mn')
Fe_plot <- plot_function(XRFplot$`Fe`, xrf_runav, 'Fe')
Ca_plot <- plot_function(XRFplot$`Ca`, xrf_runav, 'Ca')
Si_plot <- plot_function(XRFplot$`Si`, xrf_runav, 'Si')
K_plot <- plot_function(XRFplot$`K`, xrf_runav, 'K')
S_plot<- plot_function(XRFplot$`S`, xrf_runav, 'S')

Figure_2 <- cowplot::plot_grid(Ti_plot,
                               K_plot + theme(axis.text.y = element_blank(),
                                              axis.ticks.y = element_blank(),
                                              axis.title.y = element_blank(),
                                              axis.line.y = element_blank()),
                               Si_plot + theme(axis.text.y = element_blank(),
                                               axis.ticks.y = element_blank(),
                                               axis.title.y = element_blank(),
                                               axis.line.y = element_blank()),
                               S_plot + theme(axis.text.y = element_blank(),
                                              axis.ticks.y = element_blank(),
                                              axis.title.y = element_blank(),
                                              axis.line.y = element_blank()),
                               Fe_plot + theme(axis.text.y = element_blank(),
                                               axis.ticks.y = element_blank(),
                                               axis.title.y = element_blank(),
                                               axis.line.y = element_blank()),
                               Mn_plot+ theme(axis.text.y = element_blank(),
                                              axis.ticks.y = element_blank(),
                                              axis.line.y = element_blank(),
                                              axis.title.y = element_blank()),
                               Varvesseasonal+ theme(axis.text.y = element_blank(),
                                                     axis.ticks.y = element_blank(),
                                                     axis.line.y = element_blank(),
                                                     axis.title.y = element_blank()),
                               Varvesannual+ theme(axis.text.y = element_blank(),
                                                   axis.ticks.y = element_blank(),
                                                   axis.line.y = element_blank(),
                                                   axis.title.y = element_blank()),align = 'hv',
                               nrow = 1)



ggsave(filename = file.path(figdir, "Figure_2.pdf"), plot = Figure_2, width = 319.172, height = 211.361, units = "mm")

print(Figure_2)

2.4 Figure 3: Geochemical signals from the mid-Holocene ferrogenic sections

Show code
#Create subsetted dataframe for plotting
XRF_313_333cm <- XRF %>% filter(`DepthID [mm]` > 3129 & `DepthID [mm]` < 3331)

core_plot <- ggplot(XRF_313_333cm, aes(y= `DepthID [mm]`/10))+ ggpubr::background_image(Core_image) +  geom_rug(aes(color= as.factor(cluster)), sides = 'r', size = 0.25, length = unit(0.1, "npc")) + 
  scale_color_manual(values = c('1'= clust_col[1], '2' = clust_col[2], '3' = clust_col[3], '4' = clust_col[4])) +   scale_y_reverse(limits = c(333, 313), breaks = seq(300,350,1), expand = c(0,0)) + ggpubr::theme_pubr() + theme(legend.position = 'none') + labs(y='Depth (cm)')


Mn_Ti <-ggplot(XRF_313_333cm, aes(y= `DepthID [mm]`/10)) + 
  geom_path(aes(x=rollmean(`ln(Mn/Ti)`, 1, na.pad = T)),  color = 'green4') + 
  scale_y_reverse(limits = c(333, 313), breaks = seq(300,350,1), expand = c(0,0))+
  ggpubr::theme_pubr() +
  theme(axis.text.y=element_blank(), 
        axis.ticks.y=element_blank(),
        axis.title.y = element_blank(),
        axis.line.y = element_blank()) + labs(x= expression(ln(Mn/Ti)))

Fe_Ti <-ggplot(XRF_313_333cm, aes(y= `DepthID [mm]`/10)) + 
  geom_path(aes(x=rollmean(`ln(Fe/Ti)`, 1, na.pad = T)),  color = 'brown') + 
  scale_y_reverse(limits = c(333, 313), breaks = seq(300,350,1), expand = c(0,0))+
  ggpubr::theme_pubr() +
  theme(axis.text.y=element_blank(), 
        axis.ticks.y=element_blank(),
        axis.title.y = element_blank(),
        axis.line.y = element_blank()) + labs(x= expression(ln(Fe/Ti)))


PC2 <-ggplot(XRF_313_333cm, aes(y= `DepthID [mm]`/10)) + 
  geom_path(aes(x=rollmean(PC2, 1, na.pad = T)),  color = PCA_col[2]) + 
  scale_y_reverse(limits = c(333, 313), breaks = seq(300,350,1), expand = c(0,0))+
  ggpubr::theme_pubr() +
  theme(axis.text.y=element_blank(), 
        axis.ticks.y=element_blank(),
        axis.title.y = element_blank(),
        axis.line.y = element_blank())  + labs(x='PC2')
PC1 <-ggplot(XRF_313_333cm, aes(y= `DepthID [mm]`/10)) + 
  geom_path(aes(x=zoo::rollmean(PC1, 1, na.pad = T)),  color = PCA_col[1]) + 
  scale_y_reverse(limits = c(333, 313), breaks = seq(300,350,1), expand = c(0,0))+
  ggpubr::theme_pubr() +
  theme(axis.text.y=element_blank(), 
        axis.ticks.y=element_blank(),
        axis.title.y = element_blank(),
        axis.line.y = element_blank()) + labs(x='PC1')

C_plot <- XRF_313_333cm %>%
  select(`DepthID [mm]`, `Age (calBP)`,`Age (yrCE)`, cluster_1, cluster_2, cluster_3, cluster_4) %>%
  tidyr::pivot_longer(cols = starts_with("cluster_"), 
                      names_to = "cluster", 
                      values_to = "count")


Figure_3 <-cowplot::plot_grid(core_plot,Fe_Ti,Mn_Ti,PC2,PC1, nrow=1, align = 'hv')


ggsave(filename = file.path(figdir, "Figure_3.pdf"), plot = Figure_3, width = 272.509, height = 195.737, units = "mm")

# Render the plot with the correct dimensions
knitr::opts_chunk$set(fig.width = 272.509 / 25.4, fig.height = 195.737 / 25.4) # convert mm to inches
print(Figure_3)

2.5 Figure 4: PCA, clustering & core plot

PCA results

Show code
#Assign depositional stages to subset data for PCAs
# Plot with custom colors
PCA_cluster<-factoextra::fviz_pca_biplot(data.pca,  
                                         legend = 'none',
                                         title = element_blank(),
                                         axes = c(1,2),
                                         label = 'var', 
                                         pointshape = 21,
                                         pointsize = 0.5,
                                         addEllipses = F,
                                         # Customizations
                                         alpha.ind = 0.5,
                                         labelsize = 5,
                                         col.ind = XRF$cluster,
                                         fill.ind = NA,
                                         col.var = 'black',
                                         theme = ggpubr::theme_pubr() +
                                           theme(text = element_text(size = 12),
                                                 axis.title = element_text(size = 14))) +
  scale_color_manual(values = c('1'= clust_col[1], '2' = clust_col[2], '3' = clust_col[3], '4' = clust_col[4])) 

phase_colors <- c( "cyan2", "orangered2",'seagreen2')


PCA_phase<-factoextra::fviz_pca_ind(
  data.pca,
  geom = "point",
  invisible = 'point',
  col.ind = XRF$Phase, # Color by the Phase column
  alpha.ind = 0,
  addEllipses = TRUE, # Add concentration ellipses
  # ellipse.type = 'confidence',
  ellipse.level = 0.99,
  ellipse.alpha = 0
) + ggpubr::theme_pubr() + theme(legend.title = element_blank())


ellipses_layer <- PCA_phase$layers[[2]]
PCA<-PCA_cluster + ellipses_layer + scale_color_manual(values = c(phase_colors,clust_col))

#factoextra::fviz_screeplot(data.pca, addlabels = TRUE)

dim1<-factoextra::fviz_contrib(data.pca, 
                               title = 'Contribution of variable to PC1', color = 'black', fill = PCA_col[1], choice = "var", axes =c(1)) + ggpubr::theme_pubr(margin = FALSE, ) + theme(text = element_text(size = 10),
  axis.title.x = element_blank()) +coord_cartesian( ylim=c(0,50), expand = FALSE )


dim2<-factoextra::fviz_contrib(data.pca, 
                               title = 'Contribution of variable to PC2',color = 'black', fill = PCA_col[2], choice = "var", axes =c(2))+ ggpubr::theme_pubr() +  theme(text = element_text(size = 10),
  axis.title.x = element_blank()) +coord_cartesian( ylim=c(0,50), expand = FALSE )

dims<-cowplot::plot_grid(dim1, dim2, rel_widths = c(1,1),nrow= 2)

Core plots to investigate seasonal signals of PC axes 1 and 2

Show code
######This code is for core G1#######
XRF_imageplot<-XRF %>% filter(`DepthID [mm]` < 60) 
#rotate core image 90 degrees
G1_image_vert <- OpenImageR::rotateFixed(G1_image,90)

###plot assigned ages to geom_vlines
hline_labels <- c(as.character(G1_chronology$`Age YrAD`))
hline_positions <- c(as.numeric(G1_chronology$`Core depth (mm)`))


p_PC1<-ggplot(XRF_imageplot)+  
  ggpubr::background_image(G1_image_vert)+
  geom_rug(aes(y = `DepthID [mm]`,color =as.factor(cluster)), size = 1, length = unit(0.1, "npc")) +
  scale_color_manual(values = c('1'= clust_col[1], '2' = clust_col[2], '3' = clust_col[3], '4' = clust_col[4])) +
  geom_path(aes(y=`DepthID [mm]`, x = `PC1`-mean(PC1)),color = PCA_col[1], linewidth = 0.5)+  
  geom_hline(yintercept = hline_positions, color = 'grey80',linetype = 'dashed', linewidth = 0.5) +
  
  ggpubr::theme_pubr() +scale_y_reverse(limits = c(60,0),breaks = seq(0,60,2), expand = c(0,0)) +
  scale_x_continuous(limits = c(-2.5,2.5)) + theme(legend.position = 'none', text = element_text(size = 10))+ xlab('PC1')

#for (i in seq_along(hline_positions)) {
# p_PC1 <- p_PC1 +
#  annotate("text", y = hline_positions[i], x= 2 , label = hline_labels[i], vjust = -0.5, hjust = 0, angle = 0)
#}


p_PC2<-ggplot(XRF_imageplot)+  
  ggpubr::background_image(G1_image_vert)+
  # geom_rug(aes(y = `DepthID [mm]`,color =as.factor(cluster)), size = 1) +
  geom_path(aes(y=`DepthID [mm]`, x = (`PC2`-mean(PC2))*2),color = PCA_col[2], linewidth = 0.5)+  
  geom_hline(yintercept = hline_positions, color = 'grey80',linetype = 'dashed', linewidth = 0.5) +
  
  ggpubr::theme_pubr() +scale_y_reverse(limits = c(60,0),breaks = seq(0,60,2), expand = c(0,0)) +
  scale_x_continuous(limits = c(-1.5,1.5)) + theme(legend.position = 'none', text = element_text(size = 10))+ xlab('PC2')


for (i in seq_along(hline_positions)) {
  p_PC2 <- p_PC2 +
    annotate("text", y = hline_positions[i], x= 1.1 , label = hline_labels[i], vjust = -0.5, hjust = 0, angle = 0)
}


core_pca_plot <-cowplot::plot_grid(p_PC1,p_PC2 + theme(axis.line.y = element_blank(),
                                                       axis.text.y = element_blank(),
                                                       axis.title.y = element_blank(),
                                                       axis.ticks.y= element_blank()), nrow = 1, align = 'hv')

rm(G1_chronology,XRF_imageplot,G1_image)

Hierarchical clustering results (dendrogram, heatmap and area plot)

Show code
#Extract PC values from XRF and transpose to plot as rows
PCs_t <- as.data.frame(c(XRF[c(22:23)])) %>% as.matrix() %>% t()
#transpose dendrogram to plot as columns
dend_t<-t(dend)
#transpose dendrogram to plot as columns

#plot heatmap and dendrogram 
heatdendrplot<-grid::grid.grabExpr(ComplexHeatmap::draw(ComplexHeatmap::Heatmap(PCs_t, 
                                                                                name = 'PC_value', 
                                                                                cluster_columns = dend_t,
                                                                                column_dend_height = unit (3, 'cm'), use_raster = TRUE, raster_quality = 5,
                                                                                show_row_dend = FALSE,  # Remove row dendrogram
                                                                                heatmap_height = unit (5, 'cm'), 
                                                                                column_split = k,
                                                                                heatmap_legend_param = list(
                                                                                  legend_direction = "horizontal", 
                                                                                  legend_width = unit(5, "cm")
                                                                                  
                                                                                )),                                                         heatmap_legend_side="bottom"))

Clust_plot <- XRF %>%
  select(`DepthID [mm]`, `Age (calBP)`,`Age (yrCE)`, cluster_1, cluster_2, cluster_3, cluster_4) %>%
  tidyr::pivot_longer(cols = starts_with("cluster_"), 
                      names_to = "cluster", 
                      values_to = "count")

Combine plots

Show code
#Combine plots and save
Figure_4 <-cowplot::plot_grid(heatdendrplot,PCA,dims,core_pca_plot, rel_heights = c(0.8,1), nrow = 2, align = 'hv', labels = c('A.', 'B.', 'C.', 'D.'))
ggsave(filename = file.path(figdir, "Figure_4.pdf"), plot = Figure_4, width = 181, height = 190, units = "mm")


# Render the plot with the correct dimensions
#knitr::opts_chunk$set(fig.width = 181 / 25.4, fig.height = 154 / 25.4) # convert mm to inches
print(Figure_4)

2.6 Discussion figures:

The discussion figures include multiple time series which are loaded into an RData file in Section 2.1. These include the varve thickness data and GDD data from Nautajarvi ( Ojala and Alenius (2005)) pollen-derived temperature reconstructions from northern (Salonen et al. (2024)) and southern Scandinavia (Sejrup et al. (2016)), dendrological isotopic reconstructions (Helama et al. (2021)), alkenone sea surface temperature records from the North Atlantic (Sicre et al. (2021)) and model-derived Boreal temperature reconstructions (Van Dijk et al. (2024)). The following code plots these time series against the NAU-23 principal components.

Figure 5

Show code
max_time <- 9900
min_time <- -73

HTM_l <- c(7100, 5000)



cp<-ggplot(Clust_plot, aes(x = `Age (calBP)`, y = count, fill = cluster)) +
  geom_area(color = 'black', size = 0.1) +
  scale_fill_manual(values = c('cluster_1' = clust_col[1],'cluster_2'=clust_col[2],'cluster_3' = clust_col[3], 'cluster_4' = clust_col[4]))+
  geom_vline(xintercept = HTM_l, color = 'black')+
  geom_vline(xintercept = c(7100,5000))+
  scale_y_continuous(limits =c(0,100), breaks =seq(0,10000, 10), expand = c(0,0))+
  scale_x_reverse(limits =c(max_time,min_time), breaks =seq(-10000,10000, 500), expand = c(0,0))+
  labs( 
    x = "Varve yr BP", 
    y = "(%)") +
  ggpubr::theme_pubr() +
  theme(legend.position = 'none')



TJul <- ggplot(Salonen, aes(x= Cal_a_BP)) + geom_ribbon(aes(ymin = Tjul_min95, ymax = Tjul_max95), fill = 'orange', alpha = 0.2)+geom_path(aes(y=Tjul_Median)) + 
    geom_vline(xintercept = HTM_l, color = 'black')+
  scale_x_reverse(limits = c(max_time,min_time), breaks = seq(0,10000, 500), expand = c(0,0))+ 
  scale_y_continuous(limits = c(9.8, 16.8), breaks = seq(0,50,1))+
  ggpubr::theme_pubr()



VTplot <- tidyr::pivot_longer(VT[c(3,4,8,9)], 
                              cols = c(`Varve thickness (mm)`, `winter thickness`, `summer thickness`), 
                              names_to = "Type", 
                              values_to = "Thickness (mm)")

VTplot <- VTplot %>%
  group_by(Type) %>%
  arrange(BP) %>%
  mutate(`50yr_running_mean` = rollmean(`Thickness (mm)`, 50, fill = NA, align = "right"))

Varves <- ggplot(VTplot, aes(x=BP)) + geom_path(aes(y = `Thickness (mm)`, color = Type), size = 0.1, alpha = 0.5) +
  geom_path(aes(y = `50yr_running_mean`, color = Type), linetype = 'solid', size = 0.25) +
  geom_vline(xintercept = HTM_l, color = 'black')+
  scale_color_manual(values = c('red','black','blue'))+
  scale_y_log10(breaks = c(0.1,0.5,1,2.5,5,10)) +ggpubr::theme_pubr()+
  scale_x_reverse(limits = c(max_time,min_time), breaks = seq(0,10000, 500), expand = c(0,0)) + theme(legend.position = 'none')


PCs <- tidyr::pivot_longer(XRFannual[c('Age (calBP)','PC1','PC2')], 
                           cols = c(`PC1`, `PC2`), 
                           names_to = "PC", 
                           values_to = "Value")

PCs <- PCs %>%
  group_by(PC) %>%
  arrange(`Age (calBP)`) %>%
  mutate(`50yr_running_mean` = rollmean(`Value`, 50, fill = NA, align = "right"))


PC <- ggplot(PCs, aes(x=`Age (calBP)`)) + geom_path(aes(y = `Value`, color = PC), size = 0.1, alpha = 0.5) +
  geom_path(aes(y = `50yr_running_mean`, color = PC), linetype = 'solid', size = 0.25) +
  geom_vline(xintercept = HTM_l, color = 'black')+
  scale_color_manual(values = c(PCA_col))+
  ggpubr::theme_pubr()+
  scale_x_reverse(limits = c(max_time,min_time), breaks = seq(0,10000, 500), expand = c(0,0))+ theme(legend.position = 'none') +
  scale_y_continuous(limits = c(-3,2.5), breaks = seq(-10,10,1))


GDD_plot <- ggplot(GDD, aes(x= `Age (BP)`, y= GDD)) + 
    geom_rect(aes(ymax = 1400, ymin = 1280, xmin = -Inf, xmax = Inf), fill = 'grey80', alpha = 0.1)+
  geom_path() +
  geom_path(aes(y= zoo::rollmean(GDD, 10, na.pad = T)), colour = 'red') +
  geom_vline(xintercept = HTM_l, color = 'black')+
  geom_hline(yintercept = 1280) + geom_hline(yintercept = 1400) +
  geom_hline(yintercept = 1340)+ggpubr::theme_pubr()+
  scale_x_reverse(limits = c(max_time,min_time), breaks = seq(0,10000, 500), expand = c(0,0)) + theme(legend.position = 'none')


Sejrup_PC <- ggplot(Sejrup_stack, aes(x=`Age (yr BP)`, y = `PC`)) + 
  geom_ribbon(aes(ymin = `PC (+1 sigma)`, ymax = `PC (-1 sigma)`), fill = 'forestgreen', alpha = 0.2)+
  geom_ribbon(aes(ymin = `PC (+2 sigma)`, ymax = `PC (-2 sigma)`), fill = 'forestgreen', alpha = 0.2)+
  geom_path()+ 
  geom_ribbon(data =Sejrup_stack_PC2, aes(y = `PC (median)`,ymin = `PC (+1 sigma)`, ymax = `PC (-1 sigma)`), fill = 'navy', alpha = 0.2)+
  geom_ribbon(data =Sejrup_stack_PC2,aes(y = `PC (median)`,ymin = `PC (+2 sigma)`, ymax = `PC (-2 sigma)`), fill = 'navy', alpha = 0.2)+
  geom_path(data =Sejrup_stack_PC2,aes(x=`Age (yr BP)`, y = `PC (median)`))+  geom_hline(yintercept = 0)+ggpubr::theme_pubr()+
  geom_vline(xintercept = HTM_l, color = 'black')+
  ggpubr::theme_pubr()+
  scale_x_reverse(limits = c(max_time,min_time), breaks = seq(0,10000, 500), expand = c(0,0)) + theme(legend.position = 'none')+
  scale_y_continuous(limits = c(-4.6,2), breaks = seq(-10,10,2))

Figure_5<-cowplot::plot_grid(cp + theme(axis.title.x = element_blank(),
                              axis.line.x = element_blank(),
                              axis.text.x = element_blank(),
                              axis.ticks.x = element_blank()),
                   Varves+ theme(axis.title.x = element_blank(),
                                 axis.line.x = element_blank(),
                                 axis.text.x = element_blank(),
                                 axis.ticks.x = element_blank()),
                   PC+ theme(axis.title.x = element_blank(),
                             axis.line.x = element_blank(),
                             axis.text.x = element_blank(),
                             axis.ticks.x = element_blank()),
                   GDD_plot+ theme(axis.title.x = element_blank(),
                                   axis.line.x = element_blank(),
                                   axis.text.x = element_blank(),
                                   axis.ticks.x = element_blank()), 
                   TJul+ theme(axis.title.x = element_blank(),
                               axis.line.x = element_blank(),
                               axis.text.x = element_blank(),
                               axis.ticks.x = element_blank()),
                   Sejrup_PC, ncol=1, align = 'hv', rel_heights = c(0.5,1,1,1,1,1), labels = c('A.','B.','C.','D.','E.','F.'))


ggsave(filename = file.path(figdir, "Figure_5.pdf"), plot = Figure_5, width = 190, height = 280, units = "mm")

print(Figure_5)

Figure 6

Show code
XRF_plot <- XRF %>% filter(`Age (calBP)` >3999 & `Age (calBP)`<8001)
smot <- 50

min_time2 <- 4500
max_time2  <- 7500


#add boxes around observed events and phases discussed in the manuscript
add_events <- function(box_col, box_col1) {
  list(
    geom_rect(aes(xmax = 5000, xmin = 7000, ymin = -Inf, ymax = Inf), fill = box_col1, alpha = 0.2, inherit.aes = FALSE),
    geom_rect(aes(xmax = 6910, xmin = 6790, ymin = -Inf, ymax = Inf), fill = box_col, alpha = 0.2, inherit.aes = FALSE),
    geom_rect(aes(xmax = 6660, xmin = 6560, ymin = -Inf, ymax = Inf), fill = box_col, alpha = 0.2, inherit.aes = FALSE),
    geom_rect(aes(xmax = 6405, xmin = 6290, ymin = -Inf, ymax = Inf), fill = box_col, alpha = 0.2, inherit.aes = FALSE),
    geom_rect(aes(xmax = 6250, xmin = 6080, ymin = -Inf, ymax = Inf), fill = box_col, alpha = 0.2, inherit.aes = FALSE),
    geom_rect(aes(xmax = 5980, xmin = 5950, ymin = -Inf, ymax = Inf), fill = box_col, alpha = 0.2, inherit.aes = FALSE),
    geom_rect(aes(xmax = 5850, xmin = 5810, ymin = -Inf, ymax = Inf), fill = box_col, alpha = 0.2, inherit.aes = FALSE),
    geom_rect(aes(xmax = 5430, xmin = 5380, ymin = -Inf, ymax = Inf), fill = box_col, alpha = 0.2, inherit.aes = FALSE),
    geom_rect(aes(xmax = 5350, xmin = 5280, ymin = -Inf, ymax = Inf), fill = box_col, alpha = 0.2, inherit.aes = FALSE),
    geom_rect(aes(xmax = 5230, xmin = 5100, ymin = -Inf, ymax = Inf), fill = box_col, alpha = 0.2, inherit.aes = FALSE)
  )
}

#define the box colour for these events/ phases
box_col1 <- 'grey90'
box_col <- 'grey60'


#NAU-23 XRF PC-2
p <-ggplot(XRF_plot, aes(x= `Age (calBP)`, y = PC2)) + add_events(box_col, box_col1)+
  geom_rug(data = XRF, aes(x= `Age (calBP)`, color = cluster), linewidth = 5, sides = 't', show.legend = FALSE)+ scale_color_manual(values = clust_col)+
  geom_path(aes(y = rollmean(PC2,1, na.pad = TRUE)),color = PCA_col[2], alpha = 0.5, linetype = 'solid', size = 0.2) +
  geom_path(aes(y = rollmean(PC2, smot, na.pad = TRUE)), color = 'purple4', linetype = 'solid', size = 0.5) +
  geom_hline(yintercept = mean(XRF_plot$PC2))+
  scale_x_reverse(limits = c(7500,4500), breaks = seq(7500,4500,-100), expand = c(0,0))+  theme(legend.position = 'none') +
  ggpubr::theme_pubr()

# Calculate the mean GDD between 4500 and 6500 BP
GDD_MH <- GDD %>%
  filter(`Age (BP)` >= 5000 & `Age (BP)` <= 7100) %>%
  summarize(mean_GDD = mean(GDD, na.rm = TRUE)) %>%
  pull(mean_GDD)

# Adjust the GDD values to represent anomalies wrt the mid Holocene mean
GDD <- GDD %>%
  mutate(GDD_anomaly = GDD - GDD_MH)

#Plot the GDD anomaly
GDD_anom<-ggplot(GDD, aes(x = `Age (BP)`, y = GDD_anomaly)) + 
  add_events(box_col, box_col1)+
  geom_path() +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_area(aes(y = ifelse(GDD_anomaly > 0, GDD_anomaly, 0)), fill = "red", alpha = 0.3) +
  geom_area(aes(y = ifelse(GDD_anomaly < 0, GDD_anomaly, 0)), fill = "blue", alpha = 0.3) +
  scale_x_reverse(limits = c(7500, 4500), breaks = seq(0, 10000, 500), expand = c(0, 0)) +
  ggpubr::theme_pubr() +
  theme(legend.position = 'none') +
  labs(x = "Age (BP)", y = "Adjusted GDD")

#Plot model-derived Boreal temps from van Dijk et al. (2024)
Tas <- ggplot(Van_Dijk_Boreal, aes(x=`Age_BP`, y=Annual)) + 
  add_events(box_col, box_col1)+
  geom_hline(yintercept = 0)+ ggpubr::theme_pubr()+
  geom_path(color = 'red',)+
  geom_path(aes(y = rollmean(Annual, 50, na.pad = TRUE)), color = 'red4', linetype = 'solid', size = 0.5) +
  scale_x_reverse(limits = c(7500,4500), breaks = seq(7500,4500,-100), expand = c(0,0))+
  labs(
    # title = "NAU mean annual precipitation reconstruction",
    x = "Age (calBP)",
    y = expression("T (\u00B0C)")
  )


#Plot TSI
TSI <- ggplot(Steinhilber, aes(x=`Age`, y=dTSI)) + 
  add_events(box_col, box_col1)+
  geom_hline(yintercept = 0)+ 
  ggpubr::theme_pubr()+
  geom_path(color = 'red')+
  ggpubr::theme_pubr()+
  scale_x_reverse(limits = c(7500,4500), breaks = seq(7500,4500,-100), expand = c(0,0))+
  labs(
    # title = "NAU mean annual precipitation reconstruction",
    x = "Age (calBP)",
    y = expression(Delta * "TSI")
  )


#Plot dendro-derived isotopic reconstructions from Helama et al. (2021)
H <-ggplot(Helama, aes(x=Age, y = oktas_r)) +  
  add_events(box_col, box_col1)+
  geom_path()+ scale_x_reverse(limits = c(7500,4500), breaks = seq(7500,4500,-100), expand = c(0,0))+ ggpubr::theme_pubr()


#Plot N. Atlantic SSTs from MD99-2275 (Sicre et al., 2021)
M99 <-ggplot(Sicre_MD99_2275, aes(x=yearBP, y = SST)) + 
  add_events(box_col, box_col1)+
  geom_path(color = 'green2')+
  geom_path(aes(y = rollmean(SST, 5, na.pad = TRUE)),color= 'green4', linetype = 'solid', size = 0.5) +
  scale_x_reverse(limits = c(7500,4500), breaks = seq(7500,4500,-100), expand = c(0,0))+
  scale_y_continuous(limits = c(6.5, 12.5))+ 
  ggpubr::theme_pubr()+
  labs(
    # title = "NAU mean annual precipitation reconstruction",
    x = "Age (calBP)",
    y = expression("SST (\u00B0C)")
  )

#Plot N. Atlantic IRD stack from Bond et al. (2001).
IRD <- ggplot(Bond_IRD, aes(x=age, y = stacked)) +   
  add_events(box_col, box_col1)+
  geom_path()+ ggpubr::theme_pubr()+
  scale_x_reverse(limits = c(7500,4500), breaks = seq(7500,4500,-100), expand = c(0,0)) + xlab('Age (varve yr BP)')

Figure_6 <-cowplot::plot_grid(p + theme(axis.title.x = element_blank(),
                             axis.line.x = element_blank(),
                             axis.text.x = element_blank(),
                             axis.ticks.x = element_blank()),
                   GDD_anom+ theme(axis.title.x = element_blank(),
                                   axis.line.x = element_blank(),
                                   axis.text.x = element_blank(),
                                   axis.ticks.x = element_blank()),
                   TSI+ theme(axis.title.x = element_blank(),
                              axis.line.x = element_blank(),
                              axis.text.x = element_blank(),
                              axis.ticks.x = element_blank()),
                   Tas +theme(axis.title.x = element_blank(),
                              axis.line.x = element_blank(),
                              axis.text.x = element_blank(),
                              axis.ticks.x = element_blank()),
                   H+ theme(axis.title.x = element_blank(),
                            axis.line.x = element_blank(),
                            axis.text.x = element_blank(),
                            axis.ticks.x = element_blank()),
                   M99+ theme(axis.title.x = element_blank(),
                              axis.line.x = element_blank(),
                              axis.text.x = element_blank(),
                              axis.ticks.x = element_blank()),
                   IRD,align = 'hv', ncol = 1)




ggsave(filename = file.path(figdir, "Figure_6.pdf"), plot = Figure_6, width = 231.353, height = 323.545, units = "mm")

print(Figure_6)

2.7 Tables

Show code
# Summarize the percentage of clusters by phase
summary_table <- XRF %>%
  group_by(Phase, cluster) %>%
  summarise(count = n()) %>%
  ungroup() %>%
  group_by(Phase) %>%
  mutate(percentage = count / sum(count) * 100) %>%
  select(Phase, cluster, percentage) %>%
  pivot_wider(names_from = cluster, values_from = percentage, values_fill = list(percentage = 0))

# Create a formatted table using gt
gt_table <- gt(summary_table) %>%
  fmt_number(
    columns = (-Phase),
    decimals = 2
  ) %>%
  cols_label(
    Phase = "Phase",
    `1` = "Cluster 1 (%)",
    `2` = "Cluster 2 (%)",
    `3` = "Cluster 3 (%)",
    `4` = "Cluster 4 (%)"
  ) %>%
  tab_options(
    table.font.size = 12,
    heading.title.font.size = 14
  )

# Print the gt table
gt_table
Cluster 1 (%) Cluster 2 (%) Cluster 3 (%) Cluster 4 (%)
Phase 1
48.84 40.38 1.19 9.58
Phase 2
5.31 16.93 26.57 51.19
Phase 3
8.75 60.97 2.43 27.85

3 Supplementary Data

The following section presents supplementary information and data from the analyses of the NAU-23 sequence. ?@fig-supplementaryone shows the marker horizons used to transfer the Nautajarvi varve chronology to the NAU-23 sequence ( Ojala and Alenius (2005)).

Supplementary Figure 1. The NAU 23 composite stratigraphy showing the location of marker horizons used to transfer the varve counted chronology.

Supplementary Figure 2. SEM images of the colloidal material deposited in the HTM. Analysis on the clastic laminae demonstrates that they represent plagioclase and potassium feldspars, consistent with the catchment geology (Sjöblom, 1990). These analyses support continual clastic biogenic varve formation throughout the HTM.

Co-variance matrices

Show code
#subset selected elements 
elements <- c('Si','S','K','Ca','Ti','Mn','Fe', 'PC1', 'PC2')
#plot correlation matrix
Corr <-as.matrix(cor(XRF[c(paste0(elements))]))
#####plot correlation matrix of different phases


####filter for EH
XRF_EH <- XRF %>% filter(`Age (calBP)` > 7100 & `Age (calBP)` <= 9900)
Corr_EH <-as.matrix(cor(XRF_EH[c(paste0(elements))]))
####filter for HTM 
XRF_HTM <- XRF %>% filter(`Age (calBP)` >= 5000 & `Age (calBP)` <= 7100)
Corr_HTM <-as.matrix(cor(XRF_HTM[c(paste0(elements))]))
####filter for LH 
XRF_LH <- XRF %>% filter(`Age (calBP)` >= 50 & `Age (calBP)` <= 5000)
Corr_LH <-as.matrix(cor(XRF_LH[c(paste0(elements))]))


#plot data in from different sections
par(mfrow=c(2,2))

corrplot::corrplot(Corr,  method = 'circle', type = 'lower', insig='blank', digits = 2,
                   addCoef.col ='black', number.cex = 0.8, order = 'original',title='All varve data (<9.8ka)', diag=FALSE)
corrplot::corrplot(Corr_EH,  method = 'circle', type = 'lower', insig='blank', digits = 2,
                   addCoef.col ='black', number.cex = 0.8, order = 'original',title = 'Early Holocene (9.8-7ka)', diag=FALSE)
corrplot::corrplot(Corr_HTM,  method = 'circle', type = 'lower', insig='blank', digits = 2,
                   addCoef.col ='black', number.cex = 0.8, order = 'original',title = 'HTM (7-5ka)', diag=FALSE)
corrplot::corrplot(Corr_LH,  method = 'circle', type = 'lower', insig='blank', digits = 2,
                   addCoef.col ='black', number.cex = 0.8, order = 'original',title = 'Late Holocene (5-0ka)', diag=FALSE)

NBClust results

This section lists the code used to run the NBClust of Charrad et al. (2014) in R to objectively determine the best number of clusters to use for the NAU-23 dataset.

Show code
#Run NBClust: Took over a day. So avoid if possible
#tictoc::tic("Ward's D2 all")
#Wards_D_Nbclust<-NbClust::NbClust(data = clust_XRF, diss = NULL, distance = 'euclidean', index = 'all',min.nc = 4, max.nc = 10, method = c('ward.D2'))
#tictoc::toc()

load(paste0(RDATdir, 'Wards_D_NbClust.RData'))

#Write the best number of cluster from each indices into a data frame
nbclust_best<- data.frame(Wards_Nbclust_scale$Best.nc)

#Write a function to list and plot the NbClust results
my_fviz_nbclust <- function(nbclust_result, print.summary = TRUE, barfill = "steelblue", barcolor = "steelblue"){
  
  # Remove columns where Number_clusters < 4 and NAs
  best_nc <- as.data.frame(t(nbclust_best), stringsAsFactors = TRUE)
  best_nc <- best_nc %>%
    filter(Number_clusters >= 4 | is.na(Number_clusters)) 
  best_nc <- na.omit(best_nc)
  proposed_clusters <- best_nc$Number_clusters
  
  best_nc$Number_clusters <- as.factor(best_nc$Number_clusters)
  print(best_nc)
  all_clusters <- as.factor(4:10)
  ss <- table(factor(best_nc$Number_clusters, levels = all_clusters))
  cat("Among all indices: \n===================\n")
  for (i in 1:length(ss)) {
    cat("*", ss[i], "proposed ", names(ss)[i], "as the best number of clusters\n")
  }
  cat("\nConclusion\n=========================\n")
  cat("* According to the majority rule, the best number of clusters is ", 
      names(which.max(ss)), ".\n\n")
  df <- data.frame(Number_clusters = names(ss), freq = ss, 
                   stringsAsFactors = TRUE)
  df$Number_clusters <- factor(df$Number_clusters, levels = c(4, 5, 6, 7, 8, 9, 10))
  
  df <- df[c(1,3)]
  colnames(df) <- c('Number_clusters', 'freq')
  p <- ggpubr::ggbarplot(df, x = "Number_clusters", y = "freq", 
                         fill = "steelblue", color = "black") + coord_cartesian( ylim=c(0,15), expand = FALSE ) +
    ggplot2::labs(x = "Number of clusters k", 
                  y = "Frequency among all indices",
                  title = paste0("Optimal number of clusters - k = ", 
                                 names(which.max(ss)))) 
  p
}

#run function
my_fviz_nbclust(nbclust_best)
           Number_clusters   Value_Index
KL                       6  3.397000e+00
CH                       4  1.798020e+04
Hartigan                 6  2.580191e+03
CCC                      4 -8.747960e+01
Scott                    6  1.223032e+04
Marriot                  6  1.641167e+26
TrCovW                   6  1.136795e+08
TraceW                   6  5.814667e+03
Friedman                 6  5.277100e+00
Rubin                    6 -1.883000e-01
Cindex                  10  1.223000e-01
DB                       4  1.257600e+00
Silhouette               4  2.661000e-01
Beale                    4  1.706500e+00
Ratkowsky                4  3.905000e-01
Ball                     5  6.294946e+03
PtBiserial               4  5.238000e-01
Frey                     4  3.025500e+00
McClain                  4  1.054900e+00
Dunn                     4  1.270000e-02
SDindex                  4  2.203800e+00
SDbw                    10  5.393000e-01
Among all indices: 
===================
* 11 proposed  4 as the best number of clusters
* 1 proposed  5 as the best number of clusters
* 8 proposed  6 as the best number of clusters
* 0 proposed  7 as the best number of clusters
* 0 proposed  8 as the best number of clusters
* 0 proposed  9 as the best number of clusters
* 2 proposed  10 as the best number of clusters

Conclusion
=========================
* According to the majority rule, the best number of clusters is  4 .

Cluster PCAs

Show code
c1 <- scaled_data %>% filter(cluster ==1)
c1.pca <- princomp(c1[c('Si','S','K','Ca','Ti','Mn','Fe')])
c1p<-factoextra::fviz_pca_var(c1.pca, 
                              title = 'Cluster 1',
                              
                              axes = c(1,2),
                              label = 'var', 
                              labelsize = 3,
                              #fill.ind = XRF$wards_cluster, # Color for individuals
                              col.var =  clust_col[1],
                              theme = ggpubr::theme_pubr() +
                                theme(text = element_text(size = 12),
                                      axis.title = element_text(size = 14),
                                      legend.position = "right")) + 
scale_y_continuous(limits = c(-0.25,0.25))+ scale_x_continuous(limits = c(-0.25,0.25))

c2 <- scaled_data %>% filter(cluster ==2)
c2.pca <- princomp(c2[c('Si','S','K','Ca','Ti','Mn','Fe')])
c2p<- factoextra::fviz_pca_var(c2.pca, 
                               title = 'Cluster 2',
                               axes = c(1,2),
                               label = 'var', 
                               labelsize = 3,
                               col.var = clust_col[2],
                               theme = ggpubr::theme_pubr() +
                                 theme(text = element_text(size = 12),
                                       axis.title = element_text(size = 14),
                                       legend.position = "right")) +
scale_y_continuous(limits = c(-0.25,0.25))+
scale_x_continuous(limits = c(-0.25,0.25))

c3 <- scaled_data %>% filter(cluster ==3)
c3.pca <- princomp(c3[c('Si','S','K','Ca','Ti','Mn','Fe')])
c3p<-factoextra::fviz_pca_var(c3.pca, 
                              title = 'Cluster 3',
                              
                              axes = c(1,2),
                              label = 'var', 
                              # Customizations
                              labelsize = 3,
                              #fill.ind = XRF$wards_cluster, # Color for individuals
                              col.var = clust_col[3],
                              theme = ggpubr::theme_pubr() +
                                theme(text = element_text(size = 12),
                                      axis.title = element_text(size = 14),
                                      legend.position = "right")) +
 scale_y_continuous(limits = c(-0.25,0.25))+
  scale_x_continuous(limits = c(-0.25,0.25))

c4 <- scaled_data %>% filter(cluster ==4)
c4.pca <- princomp(c4[c('Si','S','K','Ca','Ti','Mn','Fe')])
c4p<-factoextra::fviz_pca_var(c4.pca, 
                              title = 'Cluster 4',
                              axes = c(1,2),
                              label = 'var', 
                              # Customizations
                              labelsize = 3,
                              col.var = clust_col[4],
                              theme = ggpubr::theme_pubr() +
                                theme(text = element_text(size = 12),
                                      axis.title = element_text(size = 14),
                                      legend.position = "right")) +
  scale_y_continuous(limits = c(-0.2,0.2))+
  scale_x_continuous(limits = c(-0.25,0.25))

PCA_grid <- cowplot::plot_grid(c1p, c2p, c3p, c4p, nrow = 2)




print(PCA_grid)

References

Charrad, Malika, Nadia Ghazzali, Véronique Boiteau, and Azam Niknafs. 2014. “NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set.” Journal of Statistical Software 61 (November): 1–36. https://doi.org/10.18637/jss.v061.i06.
Helama, Samuli, Markus Stoffel, Richard J. Hall, Phil D. Jones, Laura Arppe, Vladimir V. Matskovsky, Mauri Timonen, Pekka Nöjd, Kari Mielikäinen, and Markku Oinonen. 2021. “Recurrent Transitions to Little Ice Age-Like Climatic Regimes over the Holocene.” Climate Dynamics 56 (11): 3817–33. https://doi.org/10.1007/s00382-021-05669-0.
Ojala, Antti E. K., and Teija Alenius. 2005. “10000 Years of Interannual Sedimentation Recorded in the Lake Nautajärvi (Finland) Clasticorganic Varves.” Palaeogeography, Palaeoclimatology, Palaeoecology 219 (3): 285–302. https://doi.org/10.1016/j.palaeo.2005.01.002.
Salonen, J. Sakari, Niina Kuosmanen, Inger G. Alsos, Peter D. Heintzman, Dilli P. Rijal, Frederik Schenk, Freja Bogren, et al. 2024. “Uncovering Holocene Climate Fluctuations and Ancient Conifer Populations: Insights from a High-Resolution Multi-Proxy Record from Northern Finland.” Global and Planetary Change 237 (June): 104462. https://doi.org/10.1016/j.gloplacha.2024.104462.
Sejrup, Hans Petter, Heikki Seppä, Nicholas P. McKay, Darrell S. Kaufman, Áslaug Geirsdóttir, Anne de Vernal, Hans Renssen, Katrine Husum, Anne Jennings, and John T. Andrews. 2016. “North Atlantic-Fennoscandian Holocene Climate Trends and Mechanisms.” Quaternary Science Reviews, Special issue: PAST gateways (palaeo-arctic spatial and temporal gateways), 147 (September): 365–78. https://doi.org/10.1016/j.quascirev.2016.06.005.
Sicre, Marie-Alexandrine, Bassem Jalali, Jón Eiríksson, Karen-Luise Knudsen, Vincent Klein, and Violaine Pellichero. 2021. “Trends and Centennial-Scale Variability of Surface Water Temperatures in the North Atlantic During the Holocene.” Quaternary Science Reviews 265 (August): 107033. https://doi.org/10.1016/j.quascirev.2021.107033.
Van Dijk, Evelien J. C., Johann Jungclaus, Michael Sigl, Claudia Timmreck, and Kirstin Krüger. 2024. “High-Frequency Climate Forcing Causes Prolonged Cold Periods in the Holocene.” Communications Earth & Environment 5 (1): 242. https://doi.org/10.1038/s43247-024-01380-0.
Weltje, Gert, Menno Bloemsma, Rik Tjallingii, D Heslop, Ursula Röhl, Ian Croudace, and I Croudace. 2015. “Prediction of Geochemical Composition from XRF Core Scanner Data: A New Multivariate Approach Including Automatic Selection of Calibration Samples and Quantification of Uncertainties.” In, 507–34. https://doi.org/10.1007/978-94-017-9849-5_21.