Introduction

In this exercise, you will estimate species diversity using species accumulation curves and extrapolation models. You will use field data collected by your group from Svanninge Bjerge or Svanninge Bakker. The aim is to compare biodiversity between the different grassland areas surveyed by the four student groups.

What You’ll Do

  • Organise your species data for analysis.
  • Create species accumulation curves using R.
  • Estimate total species richness using:
    • Chao1 (a measure based on rare species)
    • Michaelis–Menten and Clench models (based on sampling patterns)
  • Compare results across sites and groups (we’ll do that together)
  • Think about how sampling effort and species rarity affect your results.

Learning Outcomes

By the end, you should be able to:

  • Prepare species data for biodiversity analysis.
  • Explain what species accumulation curves show.
  • Understand how Chao1 and curve-fitting models can estimate species richness.
  • Compare species richness between sites based on your data.

Please follow the steps below carefully. This document uses some simulated data — you should replace the simulated data with your own cleaned dataset. Your results will look similar (probably!) but definitely not the same.


Step 0 - load the packages you will use

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(vegan)
## Loading required package: permute
## Loading required package: lattice

Step 1 – Prepare Your Own Data

Make sure your dataset is in long format, with one row per species observation per quadrat. Your dataset should include at least these three columns:

Save your data as a CSV file and read it into R using:

# Replace this with the path to your CSV file
quadrat_data <- read.csv("your_data.csv")

Double-check column names and the first few rows:

names(quadrat_data)
head(quadrat_data)

This is a good point to check your species data for spelling errors. Take a look at the species list, and look for oddities.

table(quadrat_data$Species)

Step 2 – Example Using Simulated Data

This section uses simulated data for demonstration only! Use your own data.

# Simulated example (for reference only)
quadrat_data <- read.csv("simulated_quadrat_data_site1.csv")

# Check structure
names(quadrat_data)
## [1] "Site"      "QuadratID" "Species"
head(quadrat_data)
##    Site QuadratID              Species
## 1 Site1        Q1 Achillea_millefolium
## 2 Site1        Q1  Agrostis_capillaris
## 3 Site1        Q1      Bellis_perennis
## 4 Site1        Q1         Galium_verum
## 5 Site1        Q1     Knautia_arvensis
## 6 Site1        Q1 Leucanthemum_vulgare

Step 3 – Convert Data to Presence-Absence Matrix

We now transform the data from long format to wide format (presence/absence matrix), as required by the vegan package.

species_pa <- quadrat_data %>%
  distinct() %>% # Removes duplicates
  mutate(Presence = 1) %>%
  pivot_wider(names_from = Species, 
              values_from = Presence, 
              values_fill = 0) %>% 
  select(-Site, -QuadratID)

head(species_pa)
## # A tibble: 6 × 24
##   Achillea_millefolium Agrostis_capillaris Bellis_perennis Galium_verum
##                  <dbl>               <dbl>           <dbl>        <dbl>
## 1                    1                   1               1            1
## 2                    0                   0               1            1
## 3                    0                   0               0            0
## 4                    1                   0               0            1
## 5                    1                   0               1            0
## 6                    1                   1               0            1
## # ℹ 20 more variables: Knautia_arvensis <dbl>, Leucanthemum_vulgare <dbl>,
## #   Plantago_lanceolata <dbl>, Plantago_major <dbl>, Prunella_vulgaris <dbl>,
## #   Briza_media <dbl>, Festuca_rubra <dbl>, Hypochaeris_radicata <dbl>,
## #   Medicago_lupulina <dbl>, Rhinanthus_minor <dbl>, Rumex_acetosa <dbl>,
## #   Trifolium_repens <dbl>, Centaurea_jacea <dbl>, Trifolium_pratense <dbl>,
## #   Anthoxanthum_odoratum <dbl>, Campanula_rotundifolia <dbl>,
## #   Dactylis_glomerata <dbl>, Ranunculus_acris <dbl>, …

Step 4 – Create a Species Accumulation Curve

The specaccum() function in the vegan package calculates a species accumulation curve, which shows how the number of species increases as more quadrats are sampled. It does this by randomly adding quadrats one at a time and recording how many new species are found with each addition. This process is repeated many times to get an average curve and confidence intervals. The result helps us understand whether our sampling effort is sufficient—if the curve levels off, we’re likely capturing most of the species present; if it keeps rising, more sampling may be needed. Therefore the slope of the curve is very important!

accum <- specaccum(species_pa, method = "random")
accum
## Species Accumulation Curve
## Accumulation method: random, with 100 permutations
## Call: specaccum(comm = species_pa, method = "random") 
## 
##                                                                            
## Sites    1.000000  2.000000  3.000000  4.000000  5.000000  6.000000  7.0000
## Richness 9.820000 14.750000 18.140000 20.000000 21.280000 22.160000 22.8800
## sd       2.143253  2.105188  2.089029  1.632993  1.333939  1.079843  0.7691
##                                
## Sites     8.000000  9.000000 10
## Richness 23.320000 23.630000 24
## sd        0.601009  0.485237  0

Now we can plot the data.

plot(accum, ci.type = "polygon", ci.col = "lightblue",
     main = "Species Accumulation Curve",
     xlab = "Quadrats", ylab = "Species Richness")

The specpool function provides non-parametric estimators of total species richness based on this curve. These estimates are based on the number of rare species (species observed in only one or two quadrats) and are useful for gauging how many species are likely to be present but undetected.

specpool(species_pa)
##     Species  chao  chao.se jack1 jack1.se    jack2     boot  boot.se  n
## All      24 25.35 2.094636  26.7 1.558846 26.96667 25.52475 1.229653 10

Here, Species is observed number of species, while chao (Chao1), jack1 (Jackknife), and boot (Bootstrap) estimators give predictions of the true species richness by accounting for undetected, rare species. If these estimates are all fairly close to one another it suggests that the sampling effort was sufficient to capture most of the species present.

The Chao1 estimator is the most often used. It gives a lower-bound estimate of true species richness by focusing on rare species that are found in only one or two quadrats. It is calculated as:

\[\hat{S}_{\text{Chao1}} = S_{\text{obs}} + \frac{f_1^2}{2f_2}\]

where \(S_{\text{obs}}\) is the number of observed species, \(f_1\) is the number of species found in only one quadrat (singletons), and \(f_2\) is the number found in exactly two quadrats (doubletons). If no doubletons are observed, a corrected version is used. Chao1 is useful when sampling is incomplete, as it accounts for unseen species based on the frequency of rare ones.


Step 5 – Estimate Species Richness with Extrapolation Models

In this section, we fit mathematical models—such as the Michaelis–Menten and Clench functions—to the species accumulation curve to estimate total species richness. These models describe how species accumulate with increasing sampling effort and allow us to estimate the asymptote, or the point where no new species are likely to be found. This modelling approach can be more informative than simpler estimators like Chao1, because it uses the full pattern of accumulation across all samples, rather than relying only on rare species. As a result, these models often provide a smoother and potentially more accurate estimate of total diversity, especially when sampling effort varies or when some species are common.

Fit Michaelis–Menten Model

  • Based on enzyme kinetics (from biochemistry).
  • Formula: \(S = \frac{a \cdot x}{b + x}\)
  • Assumes a fixed upper limit to species richness (asymptote = \(a\)).
  • Therefore, you must first define species richness (e.g. from specaccum), which is problematic if your data has not actually reached an asymptote.
  • Curve flattens smoothly as sampling increases.
  • Often fits well when species accumulation slows down quickly.
accum_df <- data.frame(sites = accum$sites, richness = accum$richness)

mm_fit <- nls(richness ~ (a * sites) / (b + sites),
              data = accum_df,
              start = list(a = max(accum_df$richness), b = 5))

# Predictions
sites_seq <- seq(1, max(accum_df$sites), by = 0.1)
mm_pred <- data.frame(
  sites = sites_seq,
  richness = predict(mm_fit, newdata = data.frame(sites = sites_seq))
)

# Plot original accumulation curve
plot(accum, ci.type = "polygon", ci.col = "lightblue",
     main = "Species Accumulation with Michaelis–Menten Fit",
     xlab = "Quadrats", ylab = "Species richness")

# Add fitted Michaelis–Menten curve
lines(mm_pred$sites, mm_pred$richness, col = "red", lwd = 2)
legend("bottomright", legend = "Michaelis–Menten fit", col = "red", lwd = 2)

Fit Clench Model

  • Similar shape but flattens more gradually.
  • Formula: \(S = \frac{a \cdot x}{1 + b \cdot x}\)
  • Asymptote is calculated after fitting: \(\frac{a}{b}\)
  • You don’t need to assume a maximum richness
  • More flexible when rare species continue to appear late in sampling.
  • Often fits better when the curve keeps rising steadily.
clench_fit <- nls(richness ~ (a * sites) / (1 + b * sites),
                  data = accum_df,
                  start = list(a = 1, b = 0.1))

clench_pred <- data.frame(
  sites = sites_seq,
  richness = predict(clench_fit, newdata = data.frame(sites = sites_seq))
)

# Plot accumulation with Clench fit
plot(accum, ci.type = "polygon", ci.col = "lightblue",
     main = "Species Accumulation with Clench Fit",
     xlab = "Quadrats", ylab = "Species richness")

# Add Clench fitted curve
lines(clench_pred$sites, clench_pred$richness, col = "darkgreen", lwd = 2)
legend("bottomright", legend = "Clench fit", col = "darkgreen", lwd = 2)

# Estimate asymptotic richness from Clench model
clench_params <- coef(clench_fit)
asymptote_est <- clench_params["a"] / clench_params["b"]
asymptote_est
##        a 
## 28.73077

Step 6 – Combine All Results in One Plot

This plot shows both models and a non-parametric Chao1 estimate for comparison. In this case the estimates are extremely similar!

chao_est <- specpool(species_pa)$chao

plot(accum, ci.type = "polygon", ci.col = "lightblue",
     main = "Species Accumulation and Diversity Estimates",
     xlab = "Quadrats", ylab = "Species Richness")

lines(mm_pred$sites, mm_pred$richness, col = "red", lwd = 4)
lines(clench_pred$sites, clench_pred$richness, col = "darkgreen", lwd = 2)
abline(h = chao_est, col = "blue", lty = 2, lwd = 2)

legend("bottomright",
       legend = c("Michaelis–Menten", "Clench", "Chao1"),
       col = c("red", "darkgreen", "blue"),
       lty = c(1, 1, 2), lwd = 2)


Step 7 – Compare Results Across Groups

When all groups have completed their analyses, we will:


Notes and Reminders