1 Step 1. Estimating Diversity Profiles and Drawing Diversity Profiles Curves
2 Step 2. Plotting rarefaction/extrapolation curves and estimators for Hill numbers
3 Step 3. Cluster Analysis
4 Step 4. Rank-Abundance curves

Esto es un archivo R Markdown Notebook. Cuando ejecutas el código te será posible ver los resultados debajo de cada comando.

Ejecuta las siguientes líneas dando clic en el botón Run (en color verde) dentro del área de código o colocando el cursor dentro de dicho espacio y presionando Ctrl+Shift+Enter.

Script and illustrative data by Juan J. Morales-Trejo (just for Acknowledgements). e-mail to contact: coanocyte@gmail.com

Data and useful info can be found in: https://mega.nz/#F!zw0w2ZJK!MkMuS8mU2x1V2sL4tx0t7w

All your commentaries and improvements to this script will be very welcome.

# Citation (obligatory): All R-packages used must be cited, as well as R software.
#   citation()
#   citation("package name")

1 Step 1. Estimating Diversity Profiles and Drawing Diversity Profiles Curves

We are going to use the scripts and packages proposed by Chao and Jost (2015).

#   Install and load the SpadeR package
#   install.packages("SpadeR", dependencies = T)
library(SpadeR)

#   Load and create an object with my data for all sites' species abundances
#   objeto1 <- read.csv('C:/Path/to/your/file/matriz_abundancias.csv', header = T, row.names = 1)
objeto1 <- read.csv('/home/king-in-the-north/matriz_abundancias.csv', header = T, row.names = 1)
attach(objeto1)
# par(family = "Serif") use this command just if you use a Linux environment.
#   Next function will estimate and plot the diversity profile for q0, q1 and q2
#   You will obtain a complete table with useful data and diversity estimators.

Diversity(sitioA,"abundance",q=c(0,1,2)) # I just used the first column in my data. I mean, it was just used the first sampling site.

## 
## (1) BASIC DATA INFORMATION:
##                                Variable Value
##     Sample size                       n   247
##     Number of observed species        D     6
##     Estimated sample coverage         C     1
##     Estimated CV                     CV 1.174
## 
## (2) ESTIMATION OF SPECIES RICHNESS (DIVERSITY OF ORDER 0):
## 
##                              Estimate s.e. 95%Lower 95%Upper
##     Chao1 (Chao, 1984)              6  0.4        6        7
##     Chao1-bc                        6  0.4        6        7
##     iChao1                          6  0.4        6        7
##     ACE (Chao & Lee, 1992)          6  0.4        6        7
##     ACE-1 (Chao & Lee, 1992)        6  0.4        6        7
## 
##         Descriptions of richness estimators (See Species Part)
##        
## (3a) SHANNON ENTROPY:
## 
##                         Estimate  s.e. 95%Lower 95%Upper
##      MLE                   1.130 0.048    1.036    1.224
##      Jackknife             1.141 0.048    1.047    1.235
##      Chao & Shen           1.139 0.046    1.049    1.230
##      Chao et al. (2013)    1.141 0.048    1.047    1.234
## 
##         MLE: empirical or observed entropy.
##         Jackknife: see Zahl (1977).
##         Chao & Shen: based on the Horvitz-Thompson estimator and sample coverage method; see Chao and Shen (2003).
##         see Chao and Shen (2003).
##          Chao et al. (2013): A nearly optimal estimator of Shannon entropy; see Chao et al. (2013).
##          Estimated standard error is computed based on a bootstrap method.
##         
## (3b) SHANNON DIVERSITY (EXPONENTIAL OF SHANNON ENTROPY):
## 
##                         Estimate  s.e. 95%Lower 95%Upper
##      MLE                   3.096 0.146    2.809    3.383
##      Jackknife             3.130 0.148    2.840    3.420
##      Chao & Shen           3.125 0.143    2.846    3.404
##      Chao et al. (2013)    3.129 0.148    2.839    3.418
## 
## (4a) SIMPSON CONCENTRATION INDEX:
## 
##           Estimate    s.e. 95%Lower 95%Upper
##      MVUE  0.39623 0.02339  0.35040  0.44207
##      MLE   0.39868 0.02329  0.35303  0.44433
## 
##         MVUE: minimum variance unbiased estimator; see Eq. (2.27) of Magurran (1988).
##         MLE: maximum likelihood estimator or empirical index; see Eq. (2.26) of Magurran (1988).
##        
## (4b) SIMPSON DIVERSITY (INVERSE OF SIMPSON CONCENTRATION):
## 
##           Estimate    s.e. 95%Lower 95%Upper
##      MVUE  2.52376 0.13882  2.25167  2.79585
##      MLE   2.50828 0.13658  2.24058  2.77599
## 
## (5) CHAO AND JOST (2015) ESTIMATES OF HILL NUMBERS 
## 
##        q ChaoJost 95%Lower 95%Upper Empirical 95%Lower 95%Upper
##      1 0    6.000    5.051    6.949     6.000    5.406    6.594
##      2 1    3.129    2.790    3.468     3.096    2.761    3.431
##      3 2    2.524    2.236    2.812     2.508    2.224    2.792
## 
##         ChaoJost: diversity profile estimator derived by Chao and Jost (2015).
##          Empirical: maximum likelihood estimator (observed index).
##

The plot that you will obtain, has two types of lines (dotted and solid). Dotted represents your empirical data. And solid line represents the estimators for your data. Please check Chao & Jost (2015) paper attached to this script.

2 Step 2. Plotting rarefaction/extrapolation curves and estimators for Hill numbers

# Install and load iNEXT and ggplot2 packages
# install.packages("iNEXT", dependencies = T)
# install.packages("ggplot2", dependencies = T)
library(ggplot2)
library(iNEXT)

## 
## Attaching package: 'iNEXT'

## The following object is masked from 'package:SpadeR':
## 
##     ChaoSpecies

#   Load and create an object with my data for all sites' species abundances
#   You can use the last database to work with this example
#   objeto2 <- read.csv('C:/Path/to/your/file/matriz_abundancias.csv', header = T)
objeto2 <- read.csv('/home/king-in-the-north/matriz_abundancias_1.csv', header = T, row.names = 1)
attach(objeto2)

# You have to run this function. Have in count that you will use abundance data. If you want another approach for your data, please read Hsieh & Chao (2015) paper.

iNEXT(objeto2, q = 0, datatype = "abundance", endpoint = 520)

## Compare 6 assemblages with Hill number order q = 0.
## $class: iNEXT
## 
## $DataInfo: basic data information
##                site   n S.obs     SC f1 f2 f3 f4 f5 f6 f7 f8 f9 f10
## 1     sitioA.canopy 247     6 1.0000  0  1  1  0  1  0  0  0  0   0
## 2     sitioB.canopy 260     5 1.0000  0  1  1  0  0  0  0  0  0   0
## 3     sitioC.canopy 212     5 1.0000  1  0  0  1  0  0  0  0  0   0
## 4 sitioA.understory 251     7 1.0000  0  0  0  0  0  0  0  1  0   1
## 5 sitioB.understory 122     9 0.9919  1  1  0  0  0  1  0  1  0   0
## 6 sitioC.understory 196     7 0.9950  1  2  1  0  0  0  0  1  0   0
## 
## $iNextEst: diversity estimates with rarefied and extrapolated samples.
## $sitioA.canopy
##      m       method order    qD qD.LCL qD.UCL    SC SC.LCL SC.UCL
## 1    1 interpolated     0 1.000  1.000  1.000 0.396  0.355  0.437
## 10 123 interpolated     0 5.593  4.842  6.345 0.992  0.987  0.996
## 20 247     observed     0 6.000  5.239  6.761 1.000  0.996  1.004
## 30 376 extrapolated     0 6.000  5.095  6.905 1.000  0.998  1.002
## 40 520 extrapolated     0 6.000  4.988  7.012 1.000  0.999  1.001
## 
## $sitioB.canopy
##      m       method order    qD qD.LCL qD.UCL    SC SC.LCL SC.UCL
## 1    1 interpolated     0 1.000  1.000  1.000 0.390  0.358  0.423
## 10 130 interpolated     0 4.627  3.869  5.385 0.993  0.989  0.997
## 20 260     observed     0 5.000  4.089  5.911 1.000  0.998  1.002
## 30 383 extrapolated     0 5.000  4.052  5.948 1.000  0.999  1.001
## 40 520 extrapolated     0 5.000  4.029  5.971 1.000  1.000  1.000
## 
## $sitioC.canopy
##      m       method order    qD qD.LCL qD.UCL    SC SC.LCL SC.UCL
## 1    1 interpolated     0 1.000  1.000  1.000 0.347  0.315  0.380
## 10 106 interpolated     0 4.439  3.540  5.338 0.993  0.988  0.998
## 20 212     observed     0 5.000  3.874  6.126 1.000  0.997  1.003
## 30 358 extrapolated     0 5.000  3.819  6.181 1.000  0.999  1.001
## 40 520 extrapolated     0 5.000  3.798  6.202 1.000  1.000  1.000
## 
## $sitioA.understory
##      m       method order    qD qD.LCL qD.UCL    SC SC.LCL SC.UCL
## 1    1 interpolated     0 1.000  1.000  1.000 0.226  0.190  0.262
## 10 125 interpolated     0 6.996  6.913  7.078 1.000  0.998  1.001
## 20 251     observed     0 7.000  7.000  7.000 1.000  1.000  1.000
## 30 378 extrapolated     0 7.000  7.000  7.000 1.000  1.000  1.000
## 40 520 extrapolated     0 7.000  7.000  7.000 1.000  1.000  1.000
## 
## $sitioB.understory
##      m       method order    qD qD.LCL qD.UCL    SC SC.LCL SC.UCL
## 1    1 interpolated     0 1.000  1.000  1.000 0.159  0.136  0.181
## 10  61 interpolated     0 8.235  7.361  9.109 0.982  0.969  0.995
## 20 122     observed     0 9.000  7.589 10.411 0.992  0.977  1.007
## 30 311 extrapolated     0 9.474  7.001 11.946 1.000  0.997  1.002
## 40 520 extrapolated     0 9.495  6.803 12.187 1.000  1.000  1.000
## 
## $sitioC.understory
##      m       method order    qD qD.LCL qD.UCL    SC SC.LCL SC.UCL
## 1    1 interpolated     0 1.000  1.000  1.000 0.573  0.469  0.678
## 10  98 interpolated     0 5.876  4.207  7.545 0.981  0.969  0.992
## 20 196     observed     0 7.000  4.683  9.317 0.995  0.985  1.005
## 30 350 extrapolated     0 7.238  4.203 10.273 1.000  0.995  1.005
## 40 520 extrapolated     0 7.248  3.738 10.759 1.000  0.997  1.003
## 
## 
## $AsyEst: asymptotic diversity estimates along with related statistics.
##                 Site         Diversity Observed Estimator  s.e.   LCL
## 1      sitioA.canopy  Species richness    6.000     6.000 0.413 6.000
## 2      sitioA.canopy Shannon diversity    3.096     3.129 0.162 3.096
## 3      sitioA.canopy Simpson diversity    2.508     2.524 0.163 2.508
## 4      sitioB.canopy  Species richness    5.000     5.000 0.405 5.000
## 5      sitioB.canopy Shannon diversity    2.897     2.920 0.121 2.897
## 6      sitioB.canopy Simpson diversity    2.547     2.562 0.111 2.547
## 7      sitioC.canopy  Species richness    5.000     5.000 0.500 5.000
## 8      sitioC.canopy Shannon diversity    3.155     3.187 0.131 3.155
## 9      sitioC.canopy Simpson diversity    2.853     2.878 0.148 2.853
## 10 sitioA.understory  Species richness    7.000     7.000 0.020 7.000
## 11 sitioA.understory Shannon diversity    5.323     5.387 0.234 5.323
## 12 sitioA.understory Simpson diversity    4.367     4.426 0.312 4.367
## 13 sitioB.understory  Species richness    9.000     9.496 1.314 9.029
## 14 sitioB.understory Shannon diversity    6.814     7.071 0.420 6.814
## 15 sitioB.understory Simpson diversity    6.045     6.309 0.420 6.045
## 16 sitioC.understory  Species richness    7.000     7.249 0.726 7.013
## 17 sitioC.understory Shannon diversity    2.346     2.387 0.202 2.346
## 18 sitioC.understory Simpson diversity    1.737     1.744 0.113 1.737
##       UCL
## 1   7.033
## 2   3.445
## 3   2.842
## 4   6.016
## 5   3.158
## 6   2.780
## 7   6.480
## 8   3.444
## 9   3.168
## 10  7.039
## 11  5.846
## 12  5.039
## 13 17.384
## 14  7.895
## 15  7.132
## 16 11.715
## 17  2.784
## 18  1.965
## 
## NOTE: Only show five estimates, call iNEXT.object$iNextEst. to show complete output.

I use objeto2 with my abundance data, in q you can set the desired order to estimate (q = 0) or you can request three orders by once setting q = c(0,1,2).

Endpoint corresponds to a reference sample. You have to set the double of the value n in order to extrapolate your data. In the case of my table, I have the sitioB.canopy with the maximum n. So I set it as the reference sample. This will be clearer when you plot the curves.

Now, you can create an object that you will use to plot the curves.

out <- iNEXT(objeto2, q = 0, datatype = "abundance", endpoint = 520)

#   Plot it

plot(out, type = 1, las = 1) # Open at maximum your Plot Section window to obtain all legends for the data (I recommend to use R Studio)

out1 <- iNEXT(objeto2, q = c(0,1,2), datatype = "abundance", endpoint = 520)

plot(out1, type = 1, show.legend = TRUE, las = 1)

3 Step 3. Cluster Analysis

#  Install "ecodist" and "pvclust" packages
#   install.packages("ecodist", dependencies = T)
#   install.packages("pvclust", dependencies = T)
library(ecodist)
library(pvclust)

#   Load the script "pvclust_bcdist.R" (Script by: Niel Shanson)
source("/home/king-in-the-north/MEGA/R/Scripts/pvclust_bcdist.R")
#   In the case of your computer
#   source('C:/Path/to/your/file/pvclust_bcdist.R')

#   Now we can use our database again

cluster_sites <- pvclust(objeto2, nboot = 1000, method.hclust = "average", method.dist = "bray")

## Bootstrap (r = 0.45)... Done.
## Bootstrap (r = 0.55)... Done.
## Bootstrap (r = 0.64)... Done.
## Bootstrap (r = 0.73)... Done.
## Bootstrap (r = 0.82)... Done.
## Bootstrap (r = 1.0)... Done.
## Bootstrap (r = 1.09)... Done.
## Bootstrap (r = 1.18)... Done.
## Bootstrap (r = 1.27)... Done.
## Bootstrap (r = 1.36)... Done.

#   It's recommended to use 1000 resamplings to obtain better results. You can see pvclust help document.

cluster_sites # Run

## 
## Cluster method: average
## Distance      : bray-curtis
## 
## Estimates on edges:
## 
##      au    bp se.au se.bp      v      c  pchi
## 1 0.433 0.618 0.028 0.005 -0.066 -0.234 0.227
## 2 0.999 0.981 0.000 0.002 -2.601  0.536 0.189
## 3 0.809 0.579 0.019 0.005 -0.537  0.339 0.396
## 4 0.615 0.290 0.029 0.005  0.131  0.423 0.903
## 5 1.000 1.000 0.000 0.000  0.000  0.000 0.000

par(mfrow = c(1,1))
plot(cluster_sites, las = 1, main = "Dendrograma de agrupación con valores AU/BP (%)", xlab = "Distancia: Bray-Curtis", sub = "Método de agrupación: Promedio", ylab = "Distancia de agrupación")

# Next command print a rectangle around significative clusters
pvrect(cluster_sites, alpha = 0.95, border = "#139BCD", lwd = 2)

4 Step 4. Rank-Abundance curves

# install the next packages, you can skip this commands if you have already installed it
#install.packages(c("devtools","ggplot2"))
#library(devtools)
#install_github('JohnsonHsieh/Jade')
library(Jade)

#   In this case, you will use the same database with a little change. You don't set the parameter row.names now. That's because we are going to use the Especies column in the next sections.
objeto3 <- read.csv('/home/king-in-the-north/matriz_abundancias_1.csv', header = T)
attach(objeto3)

## The following objects are masked from objeto2:
## 
##     sitioA.canopy, sitioA.understory, sitioB.canopy,
##     sitioB.understory, sitioC.canopy, sitioC.understory

#   We are going to calculate the relative abundance according with Chao et al 2015. (Paper attached). Then, we will plot the species-rank abundances curves with the first three data more abundant.
#   You can run every object created with SpecDist in order to review the data output.

sitioA.dosel <- SpecDist(sitioA.canopy, "abundance")
sitioA.dosel

##    rank   method probability
## 4     1 detected 0.562753036
## 10    2 detected 0.234817814
## 5     3 detected 0.161943320
## 3     4 detected 0.020242915
## 11    5 detected 0.012145749
## 1     6 detected 0.008097166
## 2     7 detected 0.000000000
## 6     8 detected 0.000000000
## 7     9 detected 0.000000000
## 8    10 detected 0.000000000
## 9    11 detected 0.000000000

Check first column. It has the number of each species according its relative abundance: 4 corresponds to Atrod in this case.

par(family = "Serif", mfrow = c(2,3))
# Use par command with the aim to get six plots by graphical window.
#You can skip family type if you get problems in your script.

plot(sitioA.dosel$probability ~ sitioA.dosel$rank, pch = 19+as.numeric(sitioA.dosel$method), bg = c(2), xlab = "Rango de Especies", ylab = "Abundancia relativa", main = "Sitio A", sub = "Dosel")
lines(sitioA.dosel$rank, sitioA.dosel$probability, col = "#139BCD")
nombre <- Especies[as.numeric(rownames(sitioA.dosel))]
text(sitioA.dosel$rank[1:3], sitioA.dosel$probability[1:3], nombre[1:3], pos = 4)

sitioB.dosel <- SpecDist(sitioB.canopy, "abundance")
plot(sitioB.dosel$probability ~ sitioB.dosel$rank, pch = 19+as.numeric(sitioB.dosel$method), bg = c(2), xlab = "Rango de Especies", ylab = "Abundancia relativa", main = "Sitio B", sub = "Dosel")
lines(sitioB.dosel$rank, sitioB.dosel$probability, col = "#139BCD")
nombre <- Especies[as.numeric(rownames(sitioB.dosel))]
text(sitioB.dosel$rank[1:3], sitioB.dosel$probability[1:3], nombre[1:3], pos = 4)

sitioC.dosel <- SpecDist(sitioC.canopy, "abundance")
plot(sitioC.dosel$probability ~ sitioC.dosel$rank, pch = 19+as.numeric(sitioC.dosel$method), bg = c(2), xlab = "Rango de Especies", ylab = "Abundancia relativa", main = "Sitio C", sub = "Dosel")
lines(sitioC.dosel$rank, sitioC.dosel$probability, col = "#139BCD")
nombre <- Especies[as.numeric(rownames(sitioC.dosel))]
text(sitioC.dosel$rank[1:3], sitioC.dosel$probability[1:3], nombre[1:3], pos = 4)

sitioA.soto <- SpecDist(sitioA.understory, "abundance")
plot(sitioA.soto$probability ~ sitioA.soto$rank, pch = 20+as.numeric(sitioA.soto$method), bg = c(3), xlab = "Rango de Especies", ylab = "Abundancia relativa", sub = "Sotobosque")
lines(sitioA.soto$probability ~ sitioA.soto$rank)
nombre <- Especies[as.numeric(rownames(sitioA.soto))]
text(sitioA.soto$rank[1:3], sitioA.soto$probability[1:3], nombre[1:3], pos = 4)

sitioB.soto <- SpecDist(sitioB.understory, "abundance")
plot(sitioB.soto$probability ~ sitioB.soto$rank, pch = 20+as.numeric(sitioB.soto$method), bg = c(3), xlab = "Rango de Especies", ylab = "Abundancia relativa", sub = "Sotobosque")
lines(sitioB.soto$probability ~ sitioB.soto$rank)
nombre <- Especies[as.numeric(rownames(sitioB.soto))]
text(sitioB.soto$rank[1:3], sitioB.soto$probability[1:3], nombre[1:3], pos = 4)

sitioC.soto <- SpecDist(sitioC.understory, "abundance")
plot(sitioC.soto$probability ~ sitioC.soto$rank, pch = 20+as.numeric(sitioC.soto$method), bg = c(3), xlab = "Rango de Especies", ylab = "Abundancia relativa", sub = "Sotobosque")
lines(sitioC.soto$probability ~ sitioC.soto$rank)
nombre <- Especies[as.numeric(rownames(sitioC.soto))]
text(sitioC.soto$rank[1:3], sitioC.soto$probability[1:3], nombre[1:3], pos = 4)

Diversity estimations and data clustering

1 Step 1. Estimating Diversity Profiles and Drawing Diversity Profiles Curves

2 Step 2. Plotting rarefaction/extrapolation curves and estimators for Hill numbers

3 Step 3. Cluster Analysis

4 Step 4. Rank-Abundance curves