Prepared for Eco-Stats March lab, at the last minute

In this lab we will use the latest version of the mvabund package (3.10.1) available on R-Forge. So start by downloading it:

install.packages("mvabund", repos="http://R-Forge.R-project.org")
## Installing package into '\\INFPWFS111.ad.unsw.edu.au/Staff012$/z3103495/R/win-library/3.1'
## (as 'lib' is unspecified)
library(mvabund)
## Warning: package 'mvabund' was built under R version 3.1.3

What is this trait modelling thing about?

Often ecologists collect data (especially abundance or presence-absence data) simultaneously across many taxa, with the intention of studying what occurs where (and why). This tutorial focusses on the why - methods to help us move towards a functional explanation of community abundances. McGill et al (2006) and Shipley (2010) argue passionately for the need for this.

A common strategy in any field looking at “why” is to look for predictor variables that can explain the response. In the case of studying why some taxa are abundant at a site while others are not, the relevant predictors are species traits. These come in a matrix, different traits in different columns, different taxa in different rows.

Here is a schematic diagram (from Warton et al 2015) describing how trait modelling fits in with other things you might know about - SDM, multivariate analysis…

Fourth corner diagram

Basically we are just adding an extra type of predictor to the model (traits). More details below.

Single site example (“CATS”)

Shipley et al (2006) proposed CATS as a method to study why (relative) abundances at a single site are larger for some taxa than for others, by relating abundances to a matrix of traits. Shipley used maximum entropy, but Warton et al (2015) recently showed this is equivalent to a Poisson GLM (“CATS regression”). So we can take count data at a site and use a GLM to regress it against traits.

Consider for example ant abundances at a site somewhere in SE Australia from Gibb et al (2015). Abundances are counts, we will use the first row of antTraits$abund, and regress as a function of several trait variables (stored as a data frame in antTraits$traits, log-transformed and regressed against body length):

data(antTraits)
y=c(t(antTraits$abund[1,])) #coercing the first row of antTraits$abund into a vector
singleSite = data.frame(y, antTraits$traits,check.names=F) #make a data frame from site abundances and traits
head(singleSite) #here is what the dataset looks like
##                             y Femur.length No.spines Pilosity Polymorphism
## Amblyopone.australis        0  -0.38278140         0        2            1
## Aphaenogaster.longiceps     0   0.15408686         2        1            0
## Camponotus.cinereus.amperei 0   0.01135992         0        1            1
## Camponotus.claripes         2   0.14233775         0        1            1
## Camponotus.consobrinus      1   0.12496098         0        1            1
## Camponotus.nigriceps        6   0.09985137         0        1            1
##                             Webers.length
## Amblyopone.australis            1.0942247
## Aphaenogaster.longiceps         0.6740655
## Camponotus.cinereus.amperei     0.6940704
## Camponotus.claripes             0.8847158
## Camponotus.consobrinus          1.2145690
## Camponotus.nigriceps            1.4607617
ft=manyglm(y~.,data=singleSite, family="negative.binomial") #fit a negative binomial regression
summary(ft,nBoot=1) #find which traits are important
## 
## Test statistics:
##               wald value Pr(>wald)
## (Intercept)        1.192         1
## Femur.length       0.826         1
## No.spines          1.611         1
## Pilosity1          0.001         1
## Pilosity2          0.607         1
## Pilosity3          1.880         1
## Polymorphism1      0.906         1
## Polymorphism2      0.134         1
## Webers.length      1.618         1
## Arguments: P-value calculated using 0 resampling iterations via pit.trap resampling.
## 
## Test statistic:  3.954, p-value: 1 
## Arguments: P-value calculated using 0 resampling iterations via pit.trap resampling.
plot(ft) #fit looks OK

Of the six traits in this model, the strongest effects seem to be due to Weber’s length (body length) and Pilosity (hairiness). nBoot=1 doesn’t do resampling - note that a key assumption for resampling is violated in this dataset: the species abundances are not independent! So I’m going to eyeball and wave my arms instead.

par(mfrow=c(1,2))
plot( log(singleSite$y+1)~singleSite$Weber )
plot( log(singleSite$y+1)~singleSite$Pilosity )

It looks like any pattern for body length is not overly strong. A suggestion that small ants (log(Weber’s length)<-0.5) are absent at the site. Hairy buggers (Pilosity=3) seem to have high abundance here though.

Try again on a different site

y=c(t(antTraits$abund[2,])) #coercing the first row of antTraits$abund into a vector
anotherSite = data.frame(y, antTraits$traits,check.names=F) #make a data frame from site abundances and traits

ft=manyglm(y~.,data=anotherSite, family="negative.binomial") #fit a negative binomial regression
summary(ft,nBoot=1) #find which traits are important
## 
## Test statistics:
##               wald value Pr(>wald)
## (Intercept)        1.044         1
## Femur.length       0.915         1
## No.spines          1.055         1
## Pilosity1          0.390         1
## Pilosity2          0.447         1
## Pilosity3          0.999         1
## Polymorphism1      1.967         1
## Polymorphism2      1.312         1
## Webers.length      1.774         1
## Arguments: P-value calculated using 0 resampling iterations via pit.trap resampling.
## 
## Test statistic:  3.436, p-value: 1 
## Arguments: P-value calculated using 0 resampling iterations via pit.trap resampling.
par(mfrow=c(1,2))
plot( log(anotherSite$y+1)~anotherSite$Weber )
plot( log(anotherSite$y+1)~anotherSite$Pilosity )

Pilosity seems to be on again but any association with Weber’s length is difficult to detect. We could try another site, and another, and we will keep finding we get different results, and want to try and settle on what the broader picture is.

Multiple sites - the “fourth corner problem”

Gibb et al (2015) actually sampled at 30 sites, and abundances across these sites form a matrix (different sites in different rows, different taxa in different columns). If there are predictors describing key differences across sites (“environmental variables”) this gives us a third matrix, as in the diagram at the top of the page. We would like to study how the abundances vary across sites and taxa as a function of environment and traits. Or more to the point, which traits best explain variation across taxa in their environmental response.

One of the early attempts to look at this problem called it the “fourth corner problem” (Legendre et al 1996), seeing the issue as one where we have three matrices and what we care about is the fourth (the one that relates environment to traits). Labelled (a) in the below diagram, where L=abundance, R=environmental variables, Q=traits (taking the notation from Doledec et al 1996).

  1. Fourth corner problem
  1. fourth corner regression model
Fourth corner diagram