Multivariate analyses in R

By C Nell

Types of questions

Do groups differ in composition?
Does community structure vary among regions or over time?
Do environmental variables explain community patterns?
Which species are responsible for differences among groups?

Multivariate analysis of ecological communities with vegan

install.packages('vegan')
library(vegan) ##Community ecology: ordination, disversity & dissimilarities

Dataset

Bird abundances from 32 different plots (rows), 12 of which have 1 tree species (DIVERSITY = M) and 20 with 4 tree species (DIVERSITY = P).
Tree composition: there are a total of 6 possible tree species (treecomp), each signified with a letter A to F. Bird abundances are totalled according to their feeding guild (columns).

Get data from internet:

birds<-read.csv('https://raw.githubusercontent.com/collnell/lab-demo/master/bird_by_fg.csv')
trees<-read.csv('https://raw.githubusercontent.com/collnell/lab-demo/master/tree_comp.csv')

Or from your computer:

setwd("/Users/colleennell/Dropbox/Projects/Mexico/R") #change to data folder
birds<-read.csv('bird_by_fg.csv')
head(birds)

##   DIVERSITY PLOT CA FR GR HE IN NE OM
## 1         M    3  0  0  0  0  2  0  0
## 2         M    9  0  0  2  0  6  0  4
## 3         M   12  0  0  0  0  2  0  2
## 4         M   17  0  0  0  0  7  0  4
## 5         M   20  0  0  0  0  1  0  4
## 6         M   21  0  0  3  0 14  0  7

trees<-read.csv('tree_comp.csv')
head(trees)

##   PLOT comp A B C D E F row col
## 1    3    D 0 0 0 1 0 0   3   1
## 2    9    A 1 0 0 0 0 0   2   2
## 3   12    E 0 0 0 0 1 0   5   2
## 4   17    F 0 0 0 0 0 1   3   3
## 5   20    A 1 0 0 0 0 0   6   3
## 6   21    B 0 1 0 0 0 0   7   3

Questions: Is C. pentandara (B) associated with variation in bird species composition? Does feeding guild composition differ between monoculture and polyculture plots?

MANOVA (Multivariate analysis of variance)

Parametric test for differences between independent groups for multiple continuous dependent variables. Like ANOVA for many response variables. Requires variables to be fewer than number of smaples.

Is C. pentandara (B) associated with variation in bird species composition? Or D & F (both Fabaceae)?

bird.matrix<-as.matrix(birds[,3:9])##response variables in a sample x species matrix
trees$B<-as.factor(trees$B)

bird.manova<-manova(bird.matrix~as.factor(B), data=trees) ##manova test
summary(bird.manova)

##              Df  Pillai approx F num Df den Df  Pr(>F)  
## as.factor(B)  1 0.39147   2.2056      7     24 0.07027 .
## Residuals    30                                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Show univariate results:

summary.aov(bird.manova)

Assumptions of MANOVA

Normal distribution
Linearity
Homogeneity of variances
Homogeneity of covariances

Problem: Most ecological data is overdispersed, has many 0’s or rare species, unequal sample sizes.
Solution: Dissimilarity coefficients, permutation tests

PERMANOVA: Permutational multivariate analysis of variance

Non-paramentric, based on dissimilarities. Allows for partitioning of variability, similar to ANOVA, allowing for complex design (multiple factors, nested design, interactions, covariates). Uses permutation to compute F-statistic (pseudo-F).
Interactive app demonstrating permutation tests

Based on Legendre & Anderson (1999, Ecological Monographs) and Anderson (2001, Austral Ecology).

Null hypothesis: Groups do not differ in spread or positioni n multivaraite space.

1. Transform or standardize data

Use square root or proportions to minimize influence of most abundant groups.

bird.mat<-sqrt(bird.matrix)#square root transform
#bird.prop<-decostand(bird.matrix, method="total")

2. Calculate ecological resemblance

Quantify pairwise compositional dissimilarity between sites based on species occurances.
- Bray-Curtis dissimilarity (abundance weighted)
- Jaccard (presence/absence)
- Gower’s (non-continuous variables)

Dissimilarity: 0 = sites are indentical, 1 = sites do not share any species

Create a dissimilarity matrix:

bird.dist<-vegdist(bird.mat, method='bray')

3. perMANOVA

Do monoculture and polyculture plots differ in feeding guild composition?

set.seed(36) #reproducible results

bird.div<-adonis2(bird.dist~DIVERSITY, data=birds, permutations = 999, method="bray", strata="PLOT")
bird.div

## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = bird.dist ~ DIVERSITY, data = birds, permutations = 999, method = "bray", strata = "PLOT")
##           Df SumOfSqs      F Pr(>F)   
## DIVERSITY  1  0.32857 4.1585  0.008 **
## Residual  30  2.37033                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

strata = ‘exchangeable units’ for permutation. Important for nested design.

4. Multivariate dispersion

The average distance to group centroid. Used as a measure of multivariate beta diversity.

dispersion<-betadisper(bird.dist, group=birds$DIVERSITY)
permutest(dispersion)

## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
## 
## Response: Distances
##           Df  Sum Sq   Mean Sq      F N.Perm Pr(>F)
## Groups     1 0.00369 0.0036924 0.2231    999  0.638
## Residuals 30 0.49659 0.0165530

plot(dispersion, hull=FALSE, ellipse=TRUE) ##sd ellipse

NMDS

Non-metric multi-dimensional scaling. Unconstrained ordination. See (https://jonlefcheck.net/2012/10/24/nmds-tutorial-in-r/).
The goal of NMDS is to represent the original position of communities in multidimensional space as accurately as possible using a reduced number of dimensions that can be easily visualized. NMDS uses rank orders to preserve distances among objects thus can accomodate a variety of data types.

Configure samples in 2-dimensional space:

birdMDS<-metaMDS(bird.mat, distance="bray", k=2, trymax=35, autotransform=TRUE) ##k is the number of dimensions
birdMDS ##metaMDS takes eaither a distance matrix or your community matrix (then requires method for 'distance=')

stressplot(birdMDS)

Stress: similarity of observed distance to ordination distance. < 0.15 to indidates acceptable fit.

install.packages('ggplot2') ##plotting package
library(ggplot2)

##pull points from MDS
NMDS1 <- birdMDS$points[,1] ##also found using scores(birdMDS)
NMDS2 <- birdMDS$points[,2]
bird.plot<-cbind(birds, NMDS1, NMDS2, trees)

#plot ordination
p<-ggplot(bird.plot, aes(NMDS1, NMDS2, color=DIVERSITY))+
  geom_point(position=position_jitter(.1), shape=3)+##separates overlapping points
  stat_ellipse(type='t',size =1)+ ##draws 95% confidence interval ellipses
  theme_minimal()
p

Add labels for tree species composition:

#plot ordination
p<-ggplot(bird.plot, aes(NMDS1, NMDS2, color=DIVERSITY))+
  stat_ellipse(type='t',size =1)+
  theme_minimal()+geom_text(data=bird.plot,aes(NMDS1, NMDS2, label=comp), position=position_jitter(.35))+
  annotate("text", x=min(NMDS1), y=min(NMDS2), label=paste('Stress =',round(birdMDS$stress,3))) #add stress to plot
p

Fit vectors to ordination

Which envornmental variables are correlated with the ordination?

fit<-envfit(birdMDS, bird.mat)
arrow<-data.frame(fit$vectors$arrows,R = fit$vectors$r, P = fit$vectors$pvals)
arrow$FG <- rownames(arrow)
arrow.p<-filter(arrow, P <= 0.05)

p<-ggplot(data=bird.plot, aes(NMDS1, NMDS2))+
  geom_point(data=bird.plot, aes(NMDS1, NMDS2, color=DIVERSITY),position=position_jitter(.1))+##separates overlapping points
  stat_ellipse(aes(fill=DIVERSITY), alpha=.2,type='t',size =1, geom="polygon")+ ##changes shading on ellipses
  theme_minimal()+
  geom_segment(data=arrow.p, aes(x=0, y=0, xend=NMDS1, yend=NMDS2, label=FG, lty=FG), arrow=arrow(length=unit(.2, "cm")*arrow.p$R)) ##add arrows (scaled by R-squared value)

p

Show gradient of insectivores:

ordisurf(birdMDS, bird.mat[,'IN'], bubble=TRUE)##bubble size reflects abundance of insectivores

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ s(x1, x2, k = 10, bs = "tp", fx = FALSE)
## 
## Estimated degrees of freedom:
## 5.6  total = 6.6 
## 
## REML score: 35.72947

Resources:
GUSTA ME - Provides several ‘wizards’ in choosing the correct statistical test, walkthrough examples of multivariate analyses, and guide to the major types of analyses.