In this exercise, you will estimate species diversity using species accumulation curves and extrapolation models. You will use field data collected by your group from Svanninge Bjerge or Svanninge Bakker. The aim is to compare biodiversity between the different grassland areas surveyed by the four student groups.
By the end, you should be able to:
Please follow the steps below carefully. This document uses some simulated data — you should replace the simulated data with your own cleaned dataset. Your results will look similar (probably!) but definitely not the same.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(vegan)
## Loading required package: permute
## Loading required package: lattice
Make sure your dataset is in long format, with one row per species observation per quadrat. Your dataset should include at least these three columns:
Site
(e.g. “Lerbjerg S” or “Lerbjerg N” etc.)QuadratID
(a unique code or number for each quadrat
(e.g. Q1, Q2…))Species
(the species name, check carefully for
typos!)Save your data as a CSV file and read it into R using:
# Replace this with the path to your CSV file
quadrat_data <- read.csv("your_data.csv")
Double-check column names and the first few rows:
names(quadrat_data)
head(quadrat_data)
This is a good point to check your species data for spelling errors. Take a look at the species list, and look for oddities.
table(quadrat_data$Species)
This section uses simulated data for demonstration only! Use your own data.
# Simulated example (for reference only)
quadrat_data <- read.csv("simulated_quadrat_data_site1.csv")
# Check structure
names(quadrat_data)
## [1] "Site" "QuadratID" "Species"
head(quadrat_data)
## Site QuadratID Species
## 1 Site1 Q1 Achillea_millefolium
## 2 Site1 Q1 Agrostis_capillaris
## 3 Site1 Q1 Bellis_perennis
## 4 Site1 Q1 Galium_verum
## 5 Site1 Q1 Knautia_arvensis
## 6 Site1 Q1 Leucanthemum_vulgare
We now transform the data from long format to wide format
(presence/absence matrix), as required by the vegan
package.
species_pa <- quadrat_data %>%
distinct() %>% # Removes duplicates
mutate(Presence = 1) %>%
pivot_wider(names_from = Species,
values_from = Presence,
values_fill = 0) %>%
select(-Site, -QuadratID)
head(species_pa)
## # A tibble: 6 × 24
## Achillea_millefolium Agrostis_capillaris Bellis_perennis Galium_verum
## <dbl> <dbl> <dbl> <dbl>
## 1 1 1 1 1
## 2 0 0 1 1
## 3 0 0 0 0
## 4 1 0 0 1
## 5 1 0 1 0
## 6 1 1 0 1
## # ℹ 20 more variables: Knautia_arvensis <dbl>, Leucanthemum_vulgare <dbl>,
## # Plantago_lanceolata <dbl>, Plantago_major <dbl>, Prunella_vulgaris <dbl>,
## # Briza_media <dbl>, Festuca_rubra <dbl>, Hypochaeris_radicata <dbl>,
## # Medicago_lupulina <dbl>, Rhinanthus_minor <dbl>, Rumex_acetosa <dbl>,
## # Trifolium_repens <dbl>, Centaurea_jacea <dbl>, Trifolium_pratense <dbl>,
## # Anthoxanthum_odoratum <dbl>, Campanula_rotundifolia <dbl>,
## # Dactylis_glomerata <dbl>, Ranunculus_acris <dbl>, …
The specaccum()
function in the vegan
package calculates a species accumulation curve, which
shows how the number of species increases as more quadrats are sampled.
It does this by randomly adding quadrats one at a time and recording how
many new species are found with each addition. This process is repeated
many times to get an average curve and confidence intervals. The result
helps us understand whether our sampling effort is sufficient—if the
curve levels off, we’re likely capturing most of the species present; if
it keeps rising, more sampling may be needed. Therefore the slope of the
curve is very important!
accum <- specaccum(species_pa, method = "random")
accum
## Species Accumulation Curve
## Accumulation method: random, with 100 permutations
## Call: specaccum(comm = species_pa, method = "random")
##
##
## Sites 1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.0000
## Richness 9.820000 14.750000 18.140000 20.000000 21.280000 22.160000 22.8800
## sd 2.143253 2.105188 2.089029 1.632993 1.333939 1.079843 0.7691
##
## Sites 8.000000 9.000000 10
## Richness 23.320000 23.630000 24
## sd 0.601009 0.485237 0
Now we can plot the data.
plot(accum, ci.type = "polygon", ci.col = "lightblue",
main = "Species Accumulation Curve",
xlab = "Quadrats", ylab = "Species Richness")
The specpool
function provides non-parametric
estimators of total species richness based on this curve. These
estimates are based on the number of rare species
(species observed in only one or two quadrats) and are useful for
gauging how many species are likely to be present but undetected.
specpool(species_pa)
## Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All 24 25.35 2.094636 26.7 1.558846 26.96667 25.52475 1.229653 10
Here, Species
is observed number of species, while
chao
(Chao1), jack1
(Jackknife), and
boot
(Bootstrap) estimators give predictions of the true
species richness by accounting for undetected, rare species. If these
estimates are all fairly close to one another it suggests that the
sampling effort was sufficient to capture most of the species
present.
The Chao1 estimator is the most often used. It gives a lower-bound estimate of true species richness by focusing on rare species that are found in only one or two quadrats. It is calculated as:
\[\hat{S}_{\text{Chao1}} = S_{\text{obs}} + \frac{f_1^2}{2f_2}\]
where \(S_{\text{obs}}\) is the number of observed species, \(f_1\) is the number of species found in only one quadrat (singletons), and \(f_2\) is the number found in exactly two quadrats (doubletons). If no doubletons are observed, a corrected version is used. Chao1 is useful when sampling is incomplete, as it accounts for unseen species based on the frequency of rare ones.
In this section, we fit mathematical models—such as the Michaelis–Menten and Clench functions—to the species accumulation curve to estimate total species richness. These models describe how species accumulate with increasing sampling effort and allow us to estimate the asymptote, or the point where no new species are likely to be found. This modelling approach can be more informative than simpler estimators like Chao1, because it uses the full pattern of accumulation across all samples, rather than relying only on rare species. As a result, these models often provide a smoother and potentially more accurate estimate of total diversity, especially when sampling effort varies or when some species are common.
specaccum
), which is problematic if your data has not
actually reached an asymptote.accum_df <- data.frame(sites = accum$sites, richness = accum$richness)
mm_fit <- nls(richness ~ (a * sites) / (b + sites),
data = accum_df,
start = list(a = max(accum_df$richness), b = 5))
# Predictions
sites_seq <- seq(1, max(accum_df$sites), by = 0.1)
mm_pred <- data.frame(
sites = sites_seq,
richness = predict(mm_fit, newdata = data.frame(sites = sites_seq))
)
# Plot original accumulation curve
plot(accum, ci.type = "polygon", ci.col = "lightblue",
main = "Species Accumulation with Michaelis–Menten Fit",
xlab = "Quadrats", ylab = "Species richness")
# Add fitted Michaelis–Menten curve
lines(mm_pred$sites, mm_pred$richness, col = "red", lwd = 2)
legend("bottomright", legend = "Michaelis–Menten fit", col = "red", lwd = 2)
clench_fit <- nls(richness ~ (a * sites) / (1 + b * sites),
data = accum_df,
start = list(a = 1, b = 0.1))
clench_pred <- data.frame(
sites = sites_seq,
richness = predict(clench_fit, newdata = data.frame(sites = sites_seq))
)
# Plot accumulation with Clench fit
plot(accum, ci.type = "polygon", ci.col = "lightblue",
main = "Species Accumulation with Clench Fit",
xlab = "Quadrats", ylab = "Species richness")
# Add Clench fitted curve
lines(clench_pred$sites, clench_pred$richness, col = "darkgreen", lwd = 2)
legend("bottomright", legend = "Clench fit", col = "darkgreen", lwd = 2)
# Estimate asymptotic richness from Clench model
clench_params <- coef(clench_fit)
asymptote_est <- clench_params["a"] / clench_params["b"]
asymptote_est
## a
## 28.73077
This plot shows both models and a non-parametric Chao1 estimate for comparison. In this case the estimates are extremely similar!
chao_est <- specpool(species_pa)$chao
plot(accum, ci.type = "polygon", ci.col = "lightblue",
main = "Species Accumulation and Diversity Estimates",
xlab = "Quadrats", ylab = "Species Richness")
lines(mm_pred$sites, mm_pred$richness, col = "red", lwd = 4)
lines(clench_pred$sites, clench_pred$richness, col = "darkgreen", lwd = 2)
abline(h = chao_est, col = "blue", lty = 2, lwd = 2)
legend("bottomright",
legend = c("Michaelis–Menten", "Clench", "Chao1"),
col = c("red", "darkgreen", "blue"),
lty = c(1, 1, 2), lwd = 2)
When all groups have completed their analyses, we will:
Compare estimated species richness from each group
Discuss possible causes of differences: