The vegan package provides tools for descriptive community ecology. It can be found on CRAN.

Some of the data used here is from Mark Gardener’s book: Statistics for Ecologists Using R and Excel


Preliminaries

Start a new script in your RStuff/scripts folder. Open your RStuff project using File/OpenProject.

Load packages

library(tidyverse)
library(here)
library(vegan)

Species richness

Suppose we have visited several sites and listed the different species present at each site. We have not recorded how many of each species are present.

Read in data

plrich<-read_csv(here("data","plant_species_lists.csv"))
glimpse(plrich)
## Rows: 183
## Columns: 2
## $ Site    <chr> "ML1", "ML1", "ML1", "ML1", "ML1", "ML1", "ML1", "ML1", "ML1",…
## $ Species <chr> "Achillea millefolium", "Centaurea nigra", "Lathyrus pratensis…
head(plrich)

Species richness the classic R way

First we create a table indicating the presence/absence of each of the species at each of the sites.

ps<-table(plrich$Species,plrich$Site)
ps # - uncomment this if you want to see the table we have just created -it is big!
##                           
##                            ML1 ML2 MU1 MU2 PL2 PU2 SL1 SL2 SU1 SU2
##   Achillea millefolium       1   1   1   1   0   1   0   0   0   0
##   Aegopodium podagraris      0   0   0   0   0   0   0   1   0   0
##   Agrostis capillaris        0   1   1   1   1   0   1   1   1   0
##   Agrostis stolonifera       0   0   0   0   0   1   0   1   0   0
##   Anthriscus sylvestris      0   0   0   0   0   0   0   0   1   1
##   Arctium minus              0   0   0   0   0   0   0   0   0   1
##   Arrhenatherum elatius      0   0   0   0   1   0   0   1   1   1
##   Bidens cernua              0   0   0   0   1   0   0   0   0   0
##   Brachythecium rutabulum    0   0   0   0   1   0   1   1   1   0
##   Bromus hordeaceus          0   0   0   0   0   0   1   0   1   0
##   Calystegia sepium          0   0   0   0   0   0   0   1   0   0
##   Capsella bursa-pastoris    0   0   0   0   1   1   0   0   0   0
##   Cardamine pratensis        0   0   0   0   1   0   0   0   0   0
##   Centaurea nigra            1   1   1   1   0   0   0   0   0   0
##   Cerastium fontanum         0   0   1   0   0   1   0   0   1   1
##   Chamerion angustifolium    0   0   0   0   0   0   0   0   1   0
##   Chenopodium album          0   0   0   0   0   0   0   0   1   0
##   Cirsium arvense            1   0   1   1   1   1   1   1   1   1
##   Cirsium palustre           0   0   0   0   1   0   0   0   0   0
##   Crataegus monogyna (s)     0   0   0   0   0   0   0   0   1   0
##   Cynosurus cristatus        1   0   0   1   0   0   0   0   0   0
##   Dactylis glomerata         0   0   0   0   0   0   0   1   1   1
##   Deschampsia flexuosa       1   0   1   0   0   0   0   0   0   0
##   Elytrigia repens           0   0   0   0   0   0   0   1   0   0
##   Epilobium hirsutum         0   0   0   0   0   0   0   1   0   1
##   Epilobium montanum         0   0   0   0   1   0   0   0   0   1
##   Fallopia convolvulus       0   0   0   0   0   0   0   0   1   1
##   Festuca rubra              1   0   1   0   0   0   0   0   0   0
##   Filipendula ulmaria        0   0   0   0   0   0   0   1   0   0
##   Fraxinus excelsior (s)     0   0   0   0   0   0   0   0   1   0
##   Galium aparine             0   0   0   0   0   0   1   1   1   1
##   Galium verum               0   1   1   0   0   0   0   0   0   0
##   Geranium columbinum        0   0   0   0   0   0   1   0   0   0
##   Geranium molle             0   0   0   0   0   0   1   0   0   0
##   Glechoma hederacea         0   0   0   0   0   0   0   0   0   1
##   Hedera helix (g)           0   0   0   0   0   0   0   0   0   1
##   Heracleum sphondylium      0   0   0   0   0   0   0   0   1   1
##   Holcus lanatus             1   1   1   0   0   1   1   1   1   1
##   Impatiens glandulifera     0   0   0   0   0   0   0   1   0   0
##   Juncus effusus             0   0   0   0   1   0   0   0   0   0
##   Juncus inflexus            0   0   0   0   0   0   0   0   0   1
##   Lathyrus pratensis         1   0   0   0   0   0   0   1   0   0
##   Leucanthemum vulgare       1   1   1   1   0   0   0   0   0   0
##   Lolium perenne             0   0   0   0   0   1   1   0   0   1
##   Lotus corniculatus         1   1   1   1   0   1   0   0   0   0
##   Phalaris arundinacea       0   0   0   0   0   0   0   1   0   0
##   Plantago lanceolata        1   1   1   1   0   0   0   0   0   0
##   Plantago major             0   1   1   1   0   0   0   0   0   0
##   Poa pratensis              0   0   0   0   0   0   0   0   1   0
##   Poa trivialis              0   0   0   0   0   0   0   1   0   0
##   Prunella vulgaris          1   1   0   1   0   0   0   0   0   0
##   Quercus robur (s)          0   1   0   0   0   0   0   0   1   0
##   Quercus seedling/sp        0   0   1   0   0   0   0   0   0   0
##   Ranunculus acris           1   0   1   0   0   0   0   0   0   0
##   Ranunculus repens          0   1   1   1   1   1   1   1   1   0
##   Rubus fruticosus agg (g)   0   0   0   0   0   0   1   1   1   1
##   Rumex acetosa              0   1   1   1   0   0   0   0   0   0
##   Rumex crispus              0   0   0   0   0   0   1   0   1   0
##   Rumex obtusifolius         0   0   1   0   0   0   0   1   1   1
##   Rumex sanguineus           0   0   0   0   0   0   1   0   0   0
##   Salix caprea (s)           0   0   0   0   0   0   0   0   0   1
##   Salix fragilis (s)         0   0   0   0   0   0   0   1   0   0
##   Silene dioica              0   0   0   0   0   0   0   0   0   1
##   Sonchus arvensis           0   0   0   0   0   0   1   0   1   1
##   Sonchus asper              0   0   0   0   0   0   0   0   0   1
##   Stachys sylvatica          0   0   0   0   0   0   0   1   1   1
##   Symphytum officinale       0   0   0   0   0   0   0   1   0   0
##   Tanacetum vulgare          0   0   0   0   0   0   0   0   0   1
##   Taraxacum seedling/sp      0   0   1   1   0   0   0   0   0   0
##   Thuidium tamariscinum      0   0   0   0   0   0   1   0   0   0
##   Trifolium dubium           0   0   0   0   0   1   0   0   0   0
##   Trifolium pratense         1   1   1   1   0   0   0   0   0   0
##   Trifolium repens           1   1   1   0   1   1   0   0   0   0
##   Urtica dioica              0   1   0   0   1   0   1   1   1   1
##   Veronica arvensis          0   0   0   0   0   0   0   0   1   0
##   Vicia hirsuta              0   0   0   0   0   0   0   0   1   1

Then we calculate the sum of each column. This will give us the number of species at each site.

psr<-colSums(ps)
psr
## ML1 ML2 MU1 MU2 PL2 PU2 SL1 SL2 SU1 SU2 
##  15  16  21  14  13  11  16  24  27  26

Species richness the tidyverse way

plr<-plrich %>%
  group_by(Site) %>%
  summarise(species.richness=n()) %>%
  arrange(-species.richness)
plr

Species richness the vegan way

ps<-table(plrich$Species,plrich$Site)
specnumber(ps,MARGIN=2)
## ML1 ML2 MU1 MU2 PL2 PU2 SL1 SL2 SU1 SU2 
##  15  16  21  14  13  11  16  24  27  26

Species richness of the dune data set.

This data set is built into the vegan package. It contains cover class values of 30 species at 20 sites, with the sites arranged by row. This is the default expectation of functions in vegan. The species names are abbreviated to 4+4 letters.

data(dune)
specnumber(dune,MARGIN=1)
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
##  5 10 10 13 14 11 13 12 13 12  9  9 10  7  8  8  7  9  9  8

Note that we use MARGIN=1 in this case, since the sites are arranged by row, rather than by column. In fact, we could have left it out, since as we can see from the help for specnumber, the default value for MARGIN is 1, ie sites in rows, species in columns, which is the default expectation of vegan. If we wanted the opposite, sites in columns and species in rows, then we would have specified MARGIN=0. But that would be perverse. Why swim against the tide? If using vegan, it is less work for you if you have sites in rows and species in columns.

Diversity Index

‘Better stories can be told about Simpson’s index than about Shannon’s index, and still grander narratives about rarefaction (Hurlbert 1971). However, these indices are all very closely related (Hill 1973), and there is no reason to despise one more than others (but if you are a graduate student, don’t drag me in, but obey your Professor’s orders).’

Given the proportion \(p_i\) of each of several species \(i\) within a population, we can estimate the so-called diversity of the population in one of several ways from proportions \(\hat p_i\) of each species within a sample. The proportions can be of total counts or of total percentage cover.

Shannon’s Diversity Index

This calculates \(H\) where

\[H=-\sum{p_i\log_b p_i}\] where \(b\) is the base of the log. Most commonly, natural logs to base \(e^1\) are used. Those are the ones often denoted ln on calculators and in some books. Or sometimes log_e, or sometimes just log. Confusing, eh? It is usually clear from the context when natural logs are being used.

Simpson’s and Inverse Simpson’s Diversity Index

These are based on the quantity

\[D=\sum_ip_i^2\]

Simpson’s index is \(1-D\) and varies from 0 to 1 while the inverse Simpson’s index is \(1/D\)

bird<-read_csv(here("data","bird.csv"))
## Rows: 6 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Species
## dbl (5): Garden, Hedgerow, Parkland, Pasture, Woodland
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(bird)
## Rows: 6
## Columns: 6
## $ Species  <chr> "Blackbird", "Chaffinch", "Great Tit", "House Sparrow", "Robi…
## $ Garden   <dbl> 47, 19, 50, 46, 9, 4
## $ Hedgerow <dbl> 10, 3, 0, 16, 3, 0
## $ Parkland <dbl> 40, 5, 10, 8, 0, 6
## $ Pasture  <dbl> 2, 0, 7, 4, 0, 0
## $ Woodland <dbl> 2, 2, 0, 0, 2, 0
tidy.bird<- bird %>%
  pivot_longer(-Species,names_to="habitat",values_to="abundance") %>%
  arrange(habitat)
dis_homemade<-tidy.bird %>%
  group_by(habitat) %>%
  filter(abundance>0) %>%
  summarise(N=sum(abundance),
            shannon.di=-sum((abundance/sum(abundance))*log(abundance/sum(abundance))),
            simpson.di=1-sum((abundance/sum(abundance))^2),
            inv.simpson.di=1/sum((abundance/sum(abundance))^2)) %>%
  arrange(-shannon.di)
dis_homemade

and now let’s do that using the diversity() function in vegan. I have left in the MARGIN and base arguments, but we could have left them out since we are happy to use their default values.

dis_vegan<-tidy.bird %>%
  group_by(habitat) %>%
  summarise(N=sum(abundance),
            shannon.di=diversity(abundance, index = "shannon", MARGIN = 2),
            simpson.di=diversity(abundance, index = "simpson", MARGIN = 2),
            inv.simpson.di=diversity(abundance, index = "invsimpson", MARGIN = 2)) %>%
  arrange(-shannon.di)
dis_vegan

Now let us find the Shannon diversity index of the 20 sites in the dune dataset:

diversity(dune,"shannon")
##        1        2        3        4        5        6        7        8 
## 1.440482 2.252516 2.193749 2.426779 2.544421 2.345946 2.471733 2.434898 
##        9       10       11       12       13       14       15       16 
## 2.493568 2.398613 2.106065 2.114495 2.099638 1.863680 1.979309 1.959795 
##       17       18       19       20 
## 1.876274 2.079387 2.134024 2.048270

See how easy this is to implement when we have arranged our data as in the dune dataset.

Are these diversities different?

If we have a number of habitats, and for each of them we have arrived at one or other of the diversity indices considered above, we might reasonably ask if there is a difference between the indices.

This is difficult. For each habitat we have a single number, so we cannot use a t-test/ANOVA, and the number is not a count, so we cannot use a chi-squared test.

(Dis)similarity

The vegan package has the function vegdist() which permits dissimilarity indices to be calculated for both presence-absence data and for data where abundances have been recorded in some form (eg count, frequency, percentage cover).

Let us calculate the Jaccard index of dissimilarity between the sites of the dune dataset. This gives cover class values of 30 species on 20 sites.

dune.dist<-vegdist(dune,method="jaccard",binary=FALSE)

The varespec dataset, in contrast, contains estimated percentage cover values of 44 species at 24 sites, the sites arranged by row.

data(varespec)
vare.dist<-vegdist(varespec,binary=FALSE)

Standardising the data

If the data need to be standardised in some way before calculating the dissimilarity index, this can be done using the decostand() function.

For example:

vare.dist <- vegdist(decostand(varespec, "norm"), "euclidean")

This particular standardisation has normalised the data by making the sums of squares of the cover values in each row add up to 1. It results in a distance measure which varies between 0 and \(\sqrt{2}\)

Visualising similarity

vegdist() produces a matrix of distance values between every pair of sites. A dendrogram is a useful way to visualise this.

dune.hc<-hclust(dune.dist)
plot(dune.hc)

vare.hc<-hclust(vare.dist)
plot(vare.hc)

Summary code

For the dune data, find the species richness and Simpson diversity index for each site, then create a dendrogram that illustrates which sites are most similar/dissimilar to each other

We suppose that we have already the data into a data frame called dune from a .csv file, which is arranged the vegan way, with sites as rows, species as columns.

# species richness
specnumber(dune)
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
##  5 10 10 13 14 11 13 12 13 12  9  9 10  7  8  8  7  9  9  8
# Simpson's diversity index
diversity(dune, index = "simpson")
##         1         2         3         4         5         6         7         8 
## 0.7345679 0.8900227 0.8787500 0.9007407 0.9140076 0.9001736 0.9075000 0.9087500 
##         9        10        11        12        13        14        15        16 
## 0.9115646 0.9031909 0.8671875 0.8685714 0.8521579 0.8333333 0.8506616 0.8429752 
##        17        18        19        20 
## 0.8355556 0.8614540 0.8740895 0.8678460
# Dissimilarity dendrogram
dune.dist<-vegdist(dune)
dune.hc<-hclust(dune.dist)
plot(dune.hc)