Community annalisys in the R package Vegan


First we want to load some data, the Oribatid mite data. 70 soil cores collected by Daniel Borcard in 1989. See Borcard et al. (1992, 1994) for details.

References

Borcard, D., P. Legendre and P. Drapeau. 1992. Partialling out the spatial component of ecological variation. Ecology 73: 1045-1055.

Borcard, D. and P. Legendre. 1994. Environmental control and spatial structure in ecological communities: an example using Oribatid mites (Acari, Oribatei). Environmental and Ecological Statistics 1: 37-61.

Borcard, D. and P. Legendre. 2002. All-scale spatial analysis of ecological data by means of principal coordinates of neighbour matrices. Ecological Modelling 153: 51-68.

#load mite data (use ?mite to get more information)
data(mite) #community data on mites collected from 70 soil cores
data(mite.env) #Environmental data collects from cores locations 
data(mite.pcnm) #Principal components of neighbor matrices (Spatial data)
skim(mite)
Data summary
Name mite
Number of rows 70
Number of columns 35
_______________________
Column type frequency:
numeric 35
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Brachy 0 1 8.73 10.08 0 3.00 4.5 11.75 42 ▇▁▂▁▁
PHTH 0 1 1.27 2.17 0 0.00 0.0 2.00 8 ▇▂▁▁▁
HPAV 0 1 8.51 7.56 0 4.00 6.5 12.00 37 ▇▅▂▁▁
RARD 0 1 1.21 2.78 0 0.00 0.0 1.00 13 ▇▁▁▁▁
SSTR 0 1 0.31 0.97 0 0.00 0.0 0.00 6 ▇▁▁▁▁
Protopl 0 1 0.37 1.61 0 0.00 0.0 0.00 13 ▇▁▁▁▁
MEGR 0 1 2.19 3.62 0 0.00 1.0 3.00 17 ▇▁▁▁▁
MPRO 0 1 0.16 0.47 0 0.00 0.0 0.00 2 ▇▁▁▁▁
TVIE 0 1 0.83 1.47 0 0.00 0.0 1.00 7 ▇▁▁▁▁
HMIN 0 1 4.91 8.47 0 0.00 0.0 4.75 36 ▇▁▁▁▁
HMIN2 0 1 1.96 3.92 0 0.00 0.0 2.75 20 ▇▁▁▁▁
NPRA 0 1 1.89 2.37 0 0.00 1.0 2.75 10 ▇▂▁▁▁
TVEL 0 1 9.06 10.93 0 0.00 3.0 19.00 42 ▇▂▃▁▁
ONOV 0 1 17.27 18.05 0 5.00 10.5 24.25 73 ▇▂▁▁▁
SUCT 0 1 16.96 13.89 0 7.25 13.5 24.00 63 ▇▅▂▁▁
LCIL 0 1 35.26 88.85 0 1.25 13.0 44.00 723 ▇▁▁▁▁
Oribatl1 0 1 1.89 3.43 0 0.00 0.0 2.75 17 ▇▁▁▁▁
Ceratoz1 0 1 1.29 1.46 0 0.00 1.0 2.00 5 ▇▂▁▁▁
PWIL 0 1 1.09 1.71 0 0.00 0.0 1.00 8 ▇▂▁▁▁
Galumna1 0 1 0.96 1.73 0 0.00 0.0 1.00 8 ▇▁▁▁▁
Stgncrs2 0 1 0.73 1.83 0 0.00 0.0 0.00 9 ▇▁▁▁▁
HRUF 0 1 0.23 0.62 0 0.00 0.0 0.00 3 ▇▁▁▁▁
Trhypch1 0 1 2.61 6.14 0 0.00 0.0 2.00 29 ▇▁▁▁▁
PPEL 0 1 0.17 0.54 0 0.00 0.0 0.00 3 ▇▁▁▁▁
NCOR 0 1 1.13 1.65 0 0.00 0.5 1.75 7 ▇▁▁▁▁
SLAT 0 1 0.40 1.23 0 0.00 0.0 0.00 8 ▇▁▁▁▁
FSET 0 1 1.86 3.18 0 0.00 0.0 2.00 12 ▇▁▁▁▁
Lepidzts 0 1 0.17 0.54 0 0.00 0.0 0.00 3 ▇▁▁▁▁
Eupelops 0 1 0.64 0.99 0 0.00 0.0 1.00 4 ▇▃▁▁▁
Miniglmn 0 1 0.24 0.79 0 0.00 0.0 0.00 5 ▇▁▁▁▁
LRUG 0 1 10.43 12.66 0 0.00 4.5 17.75 57 ▇▃▂▁▁
PLAG2 0 1 0.80 1.79 0 0.00 0.0 1.00 9 ▇▁▁▁▁
Ceratoz3 0 1 1.30 2.20 0 0.00 0.0 2.00 9 ▇▂▁▁▁
Oppiminu 0 1 1.11 1.84 0 0.00 0.0 1.75 9 ▇▂▁▁▁
Trimalc2 0 1 2.07 5.79 0 0.00 0.0 0.00 33 ▇▁▁▁▁
skim(mite.env)
Data summary
Name mite.env
Number of rows 70
Number of columns 5
_______________________
Column type frequency:
factor 3
numeric 2
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
Substrate 0 1 FALSE 7 Int: 27, Sph: 25, Sph: 11, Sph: 2
Shrub 0 1 TRUE 3 Few: 26, Man: 25, Non: 19
Topo 0 1 FALSE 2 Bla: 44, Hum: 26

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
SubsDens 0 1 39.28 11.94 21.17 30.01 36.38 46.81 80.59 ▇▇▅▂▁
WatrCont 0 1 410.64 142.36 134.13 314.13 398.47 492.82 826.96 ▃▇▆▂▁
skim(mite.pcnm)
Data summary
Name mite.pcnm
Number of rows 70
Number of columns 22
_______________________
Column type frequency:
numeric 22
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
V1 0 1 0 0.12 -0.29 -0.07 0.03 0.05 0.20 ▂▂▂▇▃
V2 0 1 0 0.12 -0.24 -0.09 -0.03 0.10 0.22 ▃▇▆▇▃
V3 0 1 0 0.12 -0.32 -0.02 0.00 0.04 0.24 ▂▁▇▆▂
V4 0 1 0 0.12 -0.27 -0.06 -0.01 0.03 0.30 ▂▃▇▂▂
V5 0 1 0 0.12 -0.26 -0.08 0.00 0.11 0.18 ▃▅▆▅▇
V6 0 1 0 0.12 -0.23 -0.08 0.01 0.06 0.28 ▃▇▇▅▁
V7 0 1 0 0.12 -0.25 -0.03 -0.01 0.06 0.35 ▃▇▆▂▁
V8 0 1 0 0.12 -0.39 -0.06 0.02 0.08 0.23 ▁▁▆▇▂
V9 0 1 0 0.12 -0.32 -0.06 0.00 0.06 0.25 ▁▂▇▃▂
V10 0 1 0 0.12 -0.25 -0.08 -0.01 0.10 0.30 ▃▆▇▆▁
V11 0 1 0 0.12 -0.38 -0.05 0.02 0.08 0.21 ▁▂▃▇▃
V12 0 1 0 0.12 -0.36 -0.05 -0.01 0.04 0.33 ▁▂▇▂▁
V13 0 1 0 0.12 -0.33 -0.06 -0.01 0.07 0.37 ▁▃▇▂▁
V14 0 1 0 0.12 -0.25 -0.06 -0.01 0.05 0.43 ▂▇▃▁▁
V15 0 1 0 0.12 -0.27 -0.08 -0.01 0.06 0.35 ▂▆▇▁▁
V16 0 1 0 0.12 -0.54 -0.03 0.01 0.08 0.24 ▁▁▂▇▃
V17 0 1 0 0.12 -0.24 -0.07 -0.01 0.06 0.42 ▂▇▅▁▁
V18 0 1 0 0.12 -0.31 -0.06 0.00 0.04 0.45 ▁▇▇▁▁
V19 0 1 0 0.12 -0.33 -0.07 0.00 0.07 0.34 ▁▃▇▃▁
V20 0 1 0 0.12 -0.29 -0.07 -0.01 0.07 0.33 ▁▅▇▂▁
V21 0 1 0 0.12 -0.29 -0.08 0.01 0.07 0.29 ▂▅▇▅▂
V22 0 1 0 0.12 -0.30 -0.05 0.00 0.06 0.27 ▁▃▇▃▂

Unconstrained ordination example

PCA

A PCA is a linear combination of the explanatory variables, is a method to reduce dimensionality.

pca<- rda(mite.env[,1:2]) #pca only on the numerical variables substrate density and water content
summary(pca)
## 
## Call:
## rda(X = mite.env[, 1:2]) 
## 
## Partitioning of variance:
##               Inertia Proportion
## Total           20410          1
## Unconstrained   20410          1
## 
## Eigenvalues, and their contribution to the variance 
## 
## Importance of components:
##                             PC1       PC2
## Eigenvalue            2.029e+04 124.71444
## Proportion Explained  9.939e-01   0.00611
## Cumulative Proportion 9.939e-01   1.00000
## 
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores:  34.44874 
## 
## 
## Species scores
## 
##             PC1      PC2
## SubsDens  1.024 -2.69164
## WatrCont 34.328  0.08033
## 
## 
## Site scores (weighted sums of species scores)
## 
##        PC1       PC2
## 1  -1.7605  -0.63174
## 2   0.7172  -5.56245
## 3  -1.1267  -2.95030
## 4  -1.4515  -3.86151
## 5  -6.0240   3.55246
## 6  -2.8682  -7.79272
## 7  -0.9248   0.51483
## 8  -4.1510 -16.92626
## 9  -2.8894  -9.32773
## 10 -5.5334   0.54782
## 11 -8.0508  -1.69209
## 12 -0.1310  -2.84254
## 13 -4.8685   2.35013
## 14 -4.9823  -1.14094
## 15 -1.7282  -8.32852
## 16 -2.5869   0.45439
## 17 -3.3172   2.34980
## 18 -3.9174   3.50344
## 19 -0.7760  -2.08491
## 20 -7.7121  -2.68514
## 21 -6.6066   2.44648
## 22 -6.5469  -0.06083
## 23 -0.3529   2.57898
## 24 -3.4709   1.28450
## 25 -3.4130   0.18084
## 26  2.2783  -3.66056
## 27 -7.7373   3.14183
## 28 -1.6351   0.66803
## 29 -2.5527   1.41477
## 30 -0.2139   0.33215
## 31  2.4373  -0.78762
## 32 -3.6795  -0.67481
## 33 -0.6817   0.79685
## 34 -1.2547  -5.63682
## 35 -3.5993   2.39848
## 36  0.2149  -0.03964
## 37 -0.4992   1.49046
## 38  3.4360  -1.83415
## 39  8.2051  -6.33862
## 40 -0.3663   1.72679
## 41  1.6714   1.74071
## 42 -0.7173   4.43907
## 43  4.9735  -4.55923
## 44  8.6794  -5.27756
## 45  3.7283  -1.37739
## 46  1.1920  -1.30245
## 47 -2.5273   1.00834
## 48  2.8027   0.16508
## 49 -1.3723   4.27520
## 50  0.6786   4.34247
## 51  1.5886   1.26338
## 52  5.1612   5.83680
## 53  0.1060   4.66696
## 54  3.4944  -1.01316
## 55  1.0709   3.12088
## 56  3.5597   0.57581
## 57  5.9163  -4.26354
## 58 -1.6766   3.83617
## 59  7.1631  -2.26949
## 60  1.5987   3.54836
## 61  6.2853  -0.48591
## 62  6.5309  -0.97203
## 63  2.0812   2.37225
## 64  0.6982   4.43166
## 65  1.9853   7.48505
## 66  3.0263   4.05830
## 67 12.1282  -0.15315
## 68  5.2148   5.71602
## 69  1.9878   4.45436
## 70  3.0915   7.46394
autoplot(pca, arrows = T)

Inertia means the amount of variance that is explained by the environmental variables. In the summary The Eigen values of each PC represents the amount of inertia of that particular PC.

The vectors in the plot are the amount of variability associated with each variable represented in the length of the vector. In this example the water content is the most important factor, the sites on the right of the 0 value are the most humid and the sites on the left side the most driest.

Constrained ordination example

In ecology is important to understand how is the relationship of the variables with the community, in this case we want to know how the substrate density and the water content is related to the community, so we can explore this especific relation.

Redundancy annalisys

rda<- rda(mite~SubsDens+WatrCont, data=mite.env)
rda
## Call: rda(formula = mite ~ SubsDens + WatrCont, data = mite.env)
## 
##                 Inertia Proportion Rank
## Total         9098.5913     1.0000     
## Constrained   1977.8327     0.2174    2
## Unconstrained 7120.7586     0.7826   35
## Inertia is variance 
## 
## Eigenvalues for constrained axes:
##   RDA1   RDA2 
## 1919.3   58.6 
## 
## Eigenvalues for unconstrained axes:
##  PC1  PC2  PC3  PC4  PC5  PC6  PC7  PC8 
## 6252  289  153  112   92   57   41   30 
## (Showing 8 of 35 unconstrained eigenvalues)
autoplot(rda, arrows = T)

Here we have a different history, the inertia values are now separated in constrained and unconstrained. The inertia constrained is the variance associated to the environmental variables and the unconstrained inertia is the amount of variance not explained by the variables (the residuals). In this example of the community structure we have a 21% of variance explained by the variables and a 87% not explained.

You can use more than one predictor matrix

rda2<- rda(mite~.+as.matrix(mite.pcnm[,1:3]), data=mite.env) #Here we take only the first tree columns of the matrix because they represents the majority of the variation
rda2
## Call: rda(formula = mite ~ SubsDens + WatrCont + Substrate + Shrub +
## Topo + as.matrix(mite.pcnm[, 1:3]), data = mite.env)
## 
##                Inertia Proportion Rank
## Total         9098.591      1.000     
## Constrained   3239.331      0.356   14
## Unconstrained 5859.260      0.644   35
## Inertia is variance 
## 
## Eigenvalues for constrained axes:
##   RDA1   RDA2   RDA3   RDA4   RDA5   RDA6   RDA7   RDA8   RDA9  RDA10  RDA11 
## 2738.9  295.8   87.2   43.0   27.2   14.2   10.7    9.3    6.4    3.0    1.6 
##  RDA12  RDA13  RDA14 
##    1.1    0.5    0.2 
## 
## Eigenvalues for unconstrained axes:
##  PC1  PC2  PC3  PC4  PC5  PC6  PC7  PC8 
## 5295  165  103   66   64   42   32   22 
## (Showing 8 of 35 unconstrained eigenvalues)
autoplot(rda2, arrows = T)

Here, 35% of the variation is explained by the variables and 64% is unexplained. The first dimensions (RDA) captures the majority of the variation. The water content is important in the community structure, in the first RDA dimension the community is divided by high and low water content; in the second RDA dimension the community sows a a gradient in the shrub density.

Partition of the variance among groups of factors.

In ecology is interesting to see how the variation of the community is affected by the spatial factors and the environment factors, to see this we can make a partition in the factors and make an analysis of each part.

v.part<- varpart(mite, mite.env[,1:2], mite.pcnm[,1:3]) #this object have the response matrix, the environmental component (first two columns) and the spatial component (first tree columns).
v.part
## 
## Partition of variance in RDA 
## 
## Call: varpart(Y = mite, X = mite.env[, 1:2], mite.pcnm[, 1:3])
## 
## Explanatory tables:
## X1:  mite.env[, 1:2]
## X2:  mite.pcnm[, 1:3] 
## 
## No. of explanatory tables: 2 
## Total variation (SS): 627803 
##             Variance: 9098.6 
## No. of observations: 70 
## 
## Partition table:
##                      Df R.squared Adj.R.squared Testable
## [a+b] = X1            2   0.21738       0.19402     TRUE
## [b+c] = X2            3   0.06929       0.02698     TRUE
## [a+b+c] = X1+X2       5   0.26410       0.20661     TRUE
## Individual fractions                                    
## [a] = X1|X2           2                 0.17962     TRUE
## [b]                   0                 0.01439    FALSE
## [c] = X2|X1           3                 0.01259     TRUE
## [d] = Residuals                         0.79339    FALSE
## ---
## Use function 'rda' to test significance of fractions of interest
plot(v.part)

Here we have tw explanatory tables, X1 represents the environmental information and X2 the spatial information. The result of the individual fractions sows that the effect of the environment controlled by the space is about 18%, the effect of the space controlled by the environment is just of 1.2%, and we have an interaction of the two factors of 1.4%. The rest of the variation is not explained by the factors 79%.

Is possible to use more partitions of the data!

v.part2<- varpart(mite, ~SubsDens+ WatrCont, ~Substrate+ Shrub+ Topo,
                  mite.pcnm[,1:3], data=mite.env, transfo = "hel")# in this object we use the two continuous variables and the two categorical variables of the environment data and the spatial data to create tree explanatory tables. The Hellinger transformation is used because we have abundance data.

v.part2
## 
## Partition of variance in RDA 
## 
## Call: varpart(Y = mite, X = ~SubsDens + WatrCont, ~Substrate + Shrub +
## Topo, mite.pcnm[, 1:3], data = mite.env, transfo = "hel")
## Species transformation:  hellinger
## Explanatory tables:
## X1:  ~SubsDens + WatrCont
## X2:  ~Substrate + Shrub + Topo
## X3:  mite.pcnm[, 1:3] 
## 
## No. of explanatory tables: 3 
## Total variation (SS): 27.205 
##             Variance: 0.39428 
## No. of observations: 70 
## 
## Partition table:
##                       Df R.square Adj.R.square Testable
## [a+d+f+g] = X1         2  0.32677      0.30667     TRUE
## [b+d+e+g] = X2         9  0.40395      0.31454     TRUE
## [c+e+f+g] = X3         3  0.33631      0.30614     TRUE
## [a+b+d+e+f+g] = X1+X2 11  0.52650      0.43670     TRUE
## [a+c+d+e+f+g] = X1+X3  5  0.43681      0.39281     TRUE
## [b+c+d+e+f+g] = X2+X3 12  0.50517      0.40100     TRUE
## [a+b+c+d+e+f+g] = All 14  0.57806      0.47066     TRUE
## Individual fractions                                   
## [a] = X1 | X2+X3       2               0.06966     TRUE
## [b] = X2 | X1+X3       9               0.07785     TRUE
## [c] = X3 | X1+X2       3               0.03396     TRUE
## [d]                    0               0.01701    FALSE
## [e]                    0               0.05218    FALSE
## [f]                    0               0.05250    FALSE
## [g]                    0               0.16750    FALSE
## [h] = Residuals                        0.52934    FALSE
## Controlling 1 table X                                  
## [a+d] = X1 | X3        2               0.08668     TRUE
## [a+f] = X1 | X2        2               0.12216     TRUE
## [b+d] = X2 | X3        9               0.09486     TRUE
## [b+e] = X2 | X1        9               0.13003     TRUE
## [c+e] = X3 | X1        3               0.08614     TRUE
## [c+f] = X3 | X2        3               0.08645     TRUE
## ---
## Use function 'rda' to test significance of fractions of interest
plot(v.part2)

Here we have in the individual fractions that 6.9% of the variance is explained by the substrate density and the water content, 7.7% is explained by the substrate, shrub density and the topography and 3.3% by the spatial information; 52.9% of the variance of the community is not explained by the factors.

Non-Metric Multidimensial Scaling (NMDS)

Nonmetric Multidimensional Scaling (NMDS) tries to find a stable solution using several random starts and calculating a Dissimilarity Index. The Bray- Curtis distance is a measure of dissimilarity between communities, ranges from 0 to 1 where 0 indicates complete similarity and 1 indicates turnover (lost) of species or similarity, is useful to evaluate beta diversity.

nmds<- metaMDS(mite)
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1491323 
## Run 1 stress 0.1689857 
## Run 2 stress 0.1640236 
## Run 3 stress 0.161169 
## Run 4 stress 0.1628545 
## Run 5 stress 0.1526241 
## Run 6 stress 0.1648139 
## Run 7 stress 0.1491411 
## ... Procrustes: rmse 0.001229656  max resid 0.006497035 
## ... Similar to previous best
## Run 8 stress 0.1655885 
## Run 9 stress 0.1686141 
## Run 10 stress 0.1768965 
## Run 11 stress 0.1524472 
## Run 12 stress 0.1515068 
## Run 13 stress 0.1608631 
## Run 14 stress 0.1562052 
## Run 15 stress 0.1569703 
## Run 16 stress 0.1509102 
## Run 17 stress 0.1564051 
## Run 18 stress 0.1660779 
## Run 19 stress 0.1719557 
## Run 20 stress 0.1655126 
## *** Solution reached
nmds
## 
## Call:
## metaMDS(comm = mite) 
## 
## global Multidimensional Scaling using monoMDS
## 
## Data:     wisconsin(sqrt(mite)) 
## Distance: bray 
## 
## Dimensions: 2 
## Stress:     0.1491323 
## Stress type 1, weak ties
## Two convergent solutions found after 20 tries
## Scaling: centring, PC rotation, halfchange scaling 
## Species: expanded scores based on 'wisconsin(sqrt(mite))'

The metaNDMS function makes a square root transformation and a wisconsing double standarization of the data automaticly, if your data is already transformed you can tweak the function.

In this case is no convergence after the 20 runs, we can tweak the trymax argument to increase this number.

nmds<- metaMDS(mite, trymax = 30)
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1491323 
## Run 1 stress 0.1521013 
## Run 2 stress 0.1614086 
## Run 3 stress 0.1694216 
## Run 4 stress 0.1583862 
## Run 5 stress 0.1722739 
## Run 6 stress 0.1621108 
## Run 7 stress 0.1638019 
## Run 8 stress 0.1509089 
## Run 9 stress 0.1491332 
## ... Procrustes: rmse 0.0003575702  max resid 0.00253382 
## ... Similar to previous best
## Run 10 stress 0.1569931 
## Run 11 stress 0.1491411 
## ... Procrustes: rmse 0.00125644  max resid 0.006526306 
## ... Similar to previous best
## Run 12 stress 0.1674287 
## Run 13 stress 0.1704731 
## Run 14 stress 0.1701325 
## Run 15 stress 0.1510155 
## Run 16 stress 0.1587817 
## Run 17 stress 0.1579184 
## Run 18 stress 0.1632758 
## Run 19 stress 0.1676698 
## Run 20 stress 0.1520774 
## *** Solution reached
nmds
## 
## Call:
## metaMDS(comm = mite, trymax = 30) 
## 
## global Multidimensional Scaling using monoMDS
## 
## Data:     wisconsin(sqrt(mite)) 
## Distance: bray 
## 
## Dimensions: 2 
## Stress:     0.1491323 
## Stress type 1, weak ties
## Two convergent solutions found after 20 tries
## Scaling: centring, PC rotation, halfchange scaling 
## Species: expanded scores based on 'wisconsin(sqrt(mite))'
autoplot(nmds, geom= "text")

The plot shows in the x axis that communities at the right have significant more individuals in the taxa showed, and so the communities of the left.

Alternative plot for associations to a categorical variable

We are interested in to see how the Shrub density can explain the community structure

col<-c ("red","blue","green")
shape<-c(18,20,15)
plot(nmds$points, col=col[mite.env$Shrub], pch= shape[mite.env$Shrub],
     cex=1.2, main="Mite Shrub Groups", xlab="NMDS1", ylab="NMDS2")
ordispider(nmds, groups=mite.env$Shrub, label=T)

Now let’s see the topo variable

plot(nmds$points, col=col[mite.env$Topo], pch= shape[mite.env$Topo],
     cex=1.2, main="Mite Topo Groups", xlab="NMDS1", ylab="NMDS2")
ordispider(nmds, groups=mite.env$Topo, label=T)