Starting off
Read, check, tidy, and modify the data
PERMANOVA (permutational multivariate analysis of variance)
Follow-up analyses

Starting off

Remember to set your working directory before you start. You also need to load today’s packages.

library(vegan)
# If you wish to use pairwiseAdonis(), you need to install the package from GitHub
# using the following steps:
# install.packages("devtools")
# library(devtools)
# install_github("pmartinezarbizu/pairwiseAdonis/pairwiseAdonis")
library(pairwiseAdonis)

Read, check, tidy, and modify the data

Dr. Myriam Barbeau has provided data on a vegetation community (counts of 20 taxa) sampled over two summers (2018 and 2019) at four different sites. Two of these sites are reference sites, and two are restoration sites. Each site-year combination had either three or four transects, and there were 15 quadrats per transect. We are most interested in determining whether there is a significant site-type:year interaction: we predict that reference sites will not change over years, whereas restoration sites will. (Although ideally we would treat sites as a random effect nested in site type, with only two replicates per site type, we would have little power to detect a difference. Therefore, we will analyse sites as a fixed effect.)

First, read the data and convert to factors.

library(vegan)
d <- read.csv("Data/permanova_example.csv")
d$Site <- factor(d$Site)
d$Sitetype <- factor(d$Sitetype)
d$Year <- factor(d$Year)
d$Transect <- factor(d$Transect)

Then, check the data. (Output not shown to save space.)

summary(d)

Next, do some basic data tidying.

# Get species names (starting at column 9)
sp <- names(d)[9:ncol(d)]

# Drop species (from vector of species names to keep) that have all zero counts
sp <- sp[colSums(d[sp]) > 0]

# Make a unique transect ID nested in Site:Year
d$TransectID <- interaction(d$Site, d$Year, d$Transect, drop = TRUE)

PERMANOVA (permutational multivariate analysis of variance)

Our desired model

Ideally, we would like to use this linear model:

Vegetation community ~ Site + Year + Site:Year + Transect(Site:Year) + error

Where:

The vegetation community data are counts of 20 taxa
Site is a fixed factor, four levels (two sites in each site type: Reference & Restoration),
Year is a fixed factor, two levels (2018 & 2019),
Transect is a random factor, 3 or 4 levels, nested in Site and Year,
Error is the quadrats, 15 replicate quadrats per transect (so, nested in everything).

The transects are the unit of replication to test the effect of Site, Year, and Site:Year.

If we were using PERMANOVA+ for PRIMER, this model would be acceptable. However, we will be using the adonis2() function from the vegan package to do our PERMANOVA. This function cannot account for random effect structure, so instead of using a nested data structure including Transect(Site:Year), we will instead aggregate quadrats by taking the mean for each transect.

Aggregate the data

Before we take the average for each transect, we will transform the count data for each species using a fourth-root transformation, which works similarly to a log transformation, but which can natively handle zeroes without needing to add a constant.

tran <- aggregate(d[sp]^0.25,
                  by = list(Site = d$Site,
                            Sitetype = d$Sitetype,
                            Year = d$Year,
                            TransectID = d$TransectID),
                  FUN = mean)
tran$SiteYear <- factor(paste(tran$Site, tran$Year, sep = "-"))
rownames(tran) <- tran$TransectID
# Create community matrix (of fourth-root transformed counts, then averaged for each transect)
comm <- as.matrix(tran[sp])

Run the PERMANOVA

PERMANOVA takes a dissimilarity matrix as input. There are many ways of calculating dissimilarities between observations (e.g., Euclidean distance, Manhattan distance). We will use Bray–Curtis dissimilarity, which is well suited to community ecology because it emphasizes relative abundances and ignores shared absences.

When a model contains an interaction term and Type-III Sums of Squares (SS) are requested using by=margin, adonis2() includes only the interaction term in its ANOVA table. So, to get a full ANOVA table, we will run the full model with the interaction and report Type-I SS. Because order of terms matters in Type-I SS with unbalanced designs, we should run it with both orderings of terms to compare the output.

set.seed(1)       # Set random seed to make pseudorandom results reproducible
adonis2(comm ~ Year * Site, data = tran, method = "bray", by = "terms")
set.seed(1)
adonis2(comm ~ Site * Year, data = tran, method = "bray", by = "terms")

## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = comm ~ Year * Site, data = tran, method = "bray", by = "terms")
##           Df SumOfSqs      R2       F Pr(>F)    
## Year       1   0.0275 0.00665  1.8079  0.197    
## Site       3   3.7732 0.91168 82.6195  0.001 ***
## Year:Site  3   0.0336 0.00811  0.7350  0.549    
## Residual  20   0.3045 0.07356                   
## Total     27   4.1387 1.00000                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = comm ~ Site * Year, data = tran, method = "bray", by = "terms")
##           Df SumOfSqs      R2       F Pr(>F)    
## Site       3   3.7732 0.91168 82.6195  0.001 ***
## Year       1   0.0275 0.00665  1.8079  0.197    
## Site:Year  3   0.0336 0.00811  0.7350  0.549    
## Residual  20   0.3045 0.07356                   
## Total     27   4.1387 1.00000                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In this case, both orderings give identical results, suggesting that there is no significant interaction, and that only the main effect of Site is significant.

Post-hoc pairwise testing

To do posthoc pairwise testing, you can use pairwise.adonis2() (or manually split up the data and re-run adonis2()).

set.seed(1)
pairwise.adonis2(comm ~ Site, method = "bray", data = tran, by = "terms")

## $parent_call
## [1] "comm ~ Site , strata = Null , permutations 999"
## 
## $RefE_vs_RefW
##          Df SumOfSqs     R2    F Pr(>F)
## Site      1   0.0166 0.0458 0.48  0.623
## Residual 10   0.3458 0.9542            
## Total    11   0.3624 1.0000            
## 
## $RefE_vs_RestE
##          Df SumOfSqs    R2      F Pr(>F)   
## Site      1   1.8408 0.908 118.44  0.003 **
## Residual 12   0.1865 0.092                 
## Total    13   2.0273 1.000                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## $RefE_vs_RestW
##          Df SumOfSqs      R2      F Pr(>F)   
## Site      1   1.8109 0.90735 117.53  0.002 **
## Residual 12   0.1849 0.09265                 
## Total    13   1.9958 1.00000                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## $RefW_vs_RestE
##          Df SumOfSqs      R2      F Pr(>F)    
## Site      1  1.95679 0.91548 129.98  0.001 ***
## Residual 12  0.18065 0.08452                  
## Total    13  2.13744 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## $RefW_vs_RestW
##          Df SumOfSqs      R2      F Pr(>F)   
## Site      1  1.92148 0.91476 128.78  0.002 **
## Residual 12  0.17905 0.08524                 
## Total    13  2.10052 1.00000                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## $RestE_vs_RestW
##          Df  SumOfSqs      R2     F Pr(>F)
## Site      1 0.0018577 0.08598 1.317  0.272
## Residual 14 0.0197475 0.91402             
## Total    15 0.0216052 1.00000             
## 
## attr(,"class")
## [1] "pwadstrata" "list"

The p-values from pairwise.adonis2() are not adjusted for multiple testing. To adjust them, we can use the p.adjust() function.

pval <- c(0.623, 0.003, 0.002, 0.001, 0.002, 0.272)
p.adjust(pval, method = "BH") # Use Benjamini-Hochberg adjustment

## [1] 0.6230 0.0045 0.0040 0.0040 0.0040 0.3264

After adjusting for multiple comparisons, all pairwise comparisons among sites are significant, except for those within site types (i.e., RefE_vs_RefW and RestE_vs_RestW).

Follow-up analyses

SIMPER (similarity percentage)

To assess the contribution of each species to the average Bray-Curtis dissimilarity between site types (Reference vs. Restoration), we will use SIMPER.

(simp <- with(tran, simper(comm, Sitetype)))

## cumulative contributions of most influential species:
## 
## $Reference_Restoration
##       Spartina.patens Spartina.alterniflora 
##             0.6026863             0.8370458

summary(simp)

## 
## Contrast: Reference_Restoration 
## 
##                        average      sd   ratio     ava     avb cumsum     p    
## Spartina.patens        0.44830 0.05903 7.59400 3.97400 0.01720  0.603 0.001 ***
## Spartina.alterniflora  0.17430 0.07029 2.48000 1.06700 2.60300  0.837 0.001 ***
## Triglochin.maritima    0.05190 0.03768 1.37700 0.48500 0.00000  0.907 0.001 ***
## Glaux.maritima         0.02350 0.02205 1.06700 0.21800 0.00000  0.938 0.001 ***
## Atriplex.spp.          0.01220 0.02077 0.58500 0.11300 0.00420  0.955 0.003 ** 
## Limonium.carolinianum  0.00870 0.01241 0.70200 0.08500 0.00000  0.967 0.001 ***
## Solidago.simpervirens  0.00790 0.00914 0.85900 0.06700 0.00000  0.977 0.001 ***
## Juncus.gerardii        0.00430 0.00987 0.43400 0.04400 0.00000  0.983 0.001 ***
## Plantago.maritima      0.00410 0.00714 0.57100 0.03700 0.00000  0.988 0.001 ***
## Salicornia.sp.         0.00360 0.00645 0.55700 0.00600 0.02920  0.993 1.000    
## Suaeda.spp.            0.00230 0.00411 0.55500 0.00000 0.02010  0.996 1.000    
## Poaceae..other.        0.00180 0.00411 0.42700 0.01500 0.00000  0.999 0.001 ***
## Polygonum.sp.          0.00060 0.00239 0.25600 0.00000 0.00550  0.999 1.000    
## Spergularia.canadensis 0.00050 0.00182 0.25600 0.00000 0.00420  1.000 1.000    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999

The two species that contribute most to differences among site types are Spartina spp.; however, most taxa contribute significantly.

PERMDISP (permutational analysis of multivariate dispersions)

Significant PERMANOVA results can be driven by differences in dispersion, not just differences in group centroids. So, to test the assumption of multivariate homogeneity of group dispersions (variances), we will use PERMDISP2, which is a multivariate analogue of Levene’s test for homogeneity of variances. If \(p>0.05\), we cannot reject the null hypothesis of homogeneous dispersion.

dist_matrix <- vegdist(comm, method = "bray")
dispersion <- betadisper(dist_matrix, tran$Site)
permutest(dispersion)
boxplot(dispersion)

## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
## 
## Response: Distances
##           Df   Sum Sq   Mean Sq      F N.Perm Pr(>F)    
## Groups     3 0.094378 0.0314594 6.0124    999  0.001 ***
## Residuals 24 0.125577 0.0052324                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The null hypothesis of homogeneity of variances is violated: there appears to be much more variance in reference sites than in restoration sites. We will examine this further with an nMDS plot.

nMDS (non-metric multidimensional scaling)

To plot the community data, we will use nMDS (non-metric multidimensional scaling), an ordination technique that represents samples in a low-dimensional space while trying to preserve the rank order of their pairwise dissimilarities. An nMDS is fitted iteratively using numerical optimization, adjusting the configuration of points to minimize a stress function (unlike a PCA, which is obtained algebraically).

set.seed(1)
(nmds <- metaMDS(comm, distance = "bray"))

## 
## Call:
## metaMDS(comm = comm, distance = "bray", trace = 0) 
## 
## global Multidimensional Scaling using monoMDS
## 
## Data:     comm 
## Distance: bray 
## 
## Dimensions: 2 
## Stress:     0.0125116 
## Stress type 1, weak ties
## Best solution was repeated 4 times in 20 tries
## The best solution was from try 3 (random start)
## Scaling: centring, PC rotation, halfchange scaling 
## Species: expanded scores based on 'comm'

Stress is a measure of how well the low-dimensional ordination preserves the rank order of dissimilarities among samples. According to Quinn & Keough (2024), nMDS plots should only be interpreted if stress is <0.2, and ideally it should be <0.15. Stress > 0.3 indicates that the fit is no better than arbitrary. Stress can be reduced by adding more dimensions (i.e., k=3 instead of the default k=2). Our stress value of 0.013 is very good.

Next, we will determine how each of the nMDS scores (i.e., the axes on the nMDS plot we will produce) explains the (transformed and aggregated) species counts for each transect.

(fit <- envfit(nmds, comm))

## 
## ***VECTORS
## 
##                           NMDS1    NMDS2     r2 Pr(>r)    
## Spartina.patens        -0.99135 -0.13124 0.9875  0.001 ***
## Spartina.alterniflora   0.92944  0.36898 0.9395  0.001 ***
## Plantago.maritima      -0.30130 -0.95353 0.2568  0.026 *  
## Triglochin.maritima    -0.19466 -0.98087 0.9046  0.001 ***
## Glaux.maritima         -0.20071 -0.97965 0.7659  0.001 ***
## Atriplex.spp.          -0.39476  0.91878 0.1755  0.101    
## Salicornia.sp.          0.46508  0.88527 0.0695  0.388    
## Suaeda.spp.             0.36124 -0.93247 0.1276  0.149    
## Limonium.carolinianum  -0.11165 -0.99375 0.6883  0.001 ***
## Solidago.simpervirens  -0.24357  0.96988 0.5753  0.001 ***
## Juncus.gerardii        -0.08001 -0.99679 0.6106  0.002 ** 
## Polygonum.sp.           0.31637  0.94864 0.0327  0.514    
## Spergularia.canadensis  0.31637  0.94864 0.0327  0.514    
## Poaceae..other.        -0.21336  0.97697 0.1443  0.152    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999

Finally, we will create the nMDS plot with species vectors.

# Set plotting symbols
site_type <- ifelse(grepl("^Ref", tran$Site), "Ref", "Rest")
side      <- ifelse(grepl("E$", tran$Site), "E", "W")

type_col <- c("Ref" = "forestgreen",
              "Rest" = "magenta3")

pch_vals <- c("E" = 21,   # circle
              "W" = 24)   # triangle

pt_col <- type_col[site_type]                          # border colour = type
pt_bg  <- ifelse(tran$Year == "2018", pt_col, "white") # filled vs open

# Create the nMDS plot and add the points for each site
plot(nmds, display = "sites", type = "n", main = NULL)
points(nmds,
       display = "sites",
       pch = pch_vals[side],
       col = pt_col,
       bg = pt_bg)

# Optionally, add a minimum spanning tree, ordiellipses, and/or an ordispider
#mst <- spantree(dist_matrix)
#lines(mst, ord = nmds, col = adjustcolor("grey", alpha = 0.4))
#ordiellipse(nmds, groups=tran$Sitetype, kind = "sd", draw = "polygon", lwd = 1.5,
#            border = type_col, col = adjustcolor(type_col, alpha.f = 0.15))
#ordispider(nmds, groups=tran$Sitetype, lwd = 0.7, col = adjustcolor(type_col, alpha.f = 0.3))

# Add species vectors from envfit()
plot(fit, p.max = 0.05, col = "blue", cex=0.5)

# Create the legend
legend_df <- expand.grid(Year = c("2018", "2019"),
                         SiteType = c("Ref", "Rest"),
                         Side = c("E", "W"),
                         stringsAsFactors = FALSE)

legend_df$label <- paste(legend_df$SiteType, legend_df$Side, legend_df$Year)
legend_df$pch <- c("E" = 21, "W" = 24)[legend_df$Side]
legend_df$col <- c("Ref" = "forestgreen", "Rest" = "magenta3")[legend_df$SiteType]
legend_df$pt.bg <- ifelse(legend_df$Year == "2019", legend_df$col, "white")

legend("bottomright",
       legend = legend_df$label,
       pch = legend_df$pch,
       col = legend_df$col,
       pt.bg = legend_df$pt.bg,
       pt.cex = 0.5,
       cex = 0.5,
       title = "Site-Year",
       bty = "n")

The nMDS plot shows that reference and restoration sites are clearly separated along NMDS1. There is little evidence of a year effect or a site-year interaction. Reference sites appear more variable than restored sites, indicating greater within-group dispersion.

BIOL 5473 Bonus Tutorial