Remember to set your working directory before you start. You also need to load today’s packages.
library(vegan)
# If you wish to use pairwiseAdonis(), you need to install the package from GitHub
# using the following steps:
# install.packages("devtools")
# library(devtools)
# install_github("pmartinezarbizu/pairwiseAdonis/pairwiseAdonis")
library(pairwiseAdonis)
Dr. Myriam Barbeau has provided data on a vegetation community (counts of 20 taxa) sampled over two summers (2018 and 2019) at four different sites. Two of these sites are reference sites, and two are restoration sites. Each site-year combination had either three or four transects, and there were 15 quadrats per transect. We are most interested in determining whether there is a significant site-type:year interaction: we predict that reference sites will not change over years, whereas restoration sites will. (Although ideally we would treat sites as a random effect nested in site type, with only two replicates per site type, we would have little power to detect a difference. Therefore, we will analyse sites as a fixed effect.)
First, read the data and convert to factors.
library(vegan)
d <- read.csv("Data/permanova_example.csv")
d$Site <- factor(d$Site)
d$Sitetype <- factor(d$Sitetype)
d$Year <- factor(d$Year)
d$Transect <- factor(d$Transect)
Then, check the data. (Output not shown to save space.)
summary(d)
Next, do some basic data tidying.
# Get species names (starting at column 9)
sp <- names(d)[9:ncol(d)]
# Drop species (from vector of species names to keep) that have all zero counts
sp <- sp[colSums(d[sp]) > 0]
# Make a unique transect ID nested in Site:Year
d$TransectID <- interaction(d$Site, d$Year, d$Transect, drop = TRUE)
Ideally, we would like to use this linear model:
Where:
The transects are the unit of replication to test the effect of Site, Year, and Site:Year.
If we were using PERMANOVA+ for PRIMER, this model would be
acceptable. However, we will be using the adonis2()
function from the vegan package to do our PERMANOVA. This
function cannot account for random effect structure, so instead of using
a nested data structure including Transect(Site:Year), we
will instead aggregate quadrats by taking the mean for each
transect.
Before we take the average for each transect, we will transform the count data for each species using a fourth-root transformation, which works similarly to a log transformation, but which can natively handle zeroes without needing to add a constant.
tran <- aggregate(d[sp]^0.25,
by = list(Site = d$Site,
Sitetype = d$Sitetype,
Year = d$Year,
TransectID = d$TransectID),
FUN = mean)
tran$SiteYear <- factor(paste(tran$Site, tran$Year, sep = "-"))
rownames(tran) <- tran$TransectID
# Create community matrix (of fourth-root transformed counts, then averaged for each transect)
comm <- as.matrix(tran[sp])
PERMANOVA takes a dissimilarity matrix as input. There are many ways of calculating dissimilarities between observations (e.g., Euclidean distance, Manhattan distance). We will use Bray–Curtis dissimilarity, which is well suited to community ecology because it emphasizes relative abundances and ignores shared absences.
When a model contains an interaction term and Type-III Sums of
Squares (SS) are requested using by=margin,
adonis2() includes only the interaction term in its ANOVA
table. So, to get a full ANOVA table, we will run the full model with
the interaction and report Type-I SS. Because order of terms matters in
Type-I SS with unbalanced designs, we should run it with both orderings
of terms to compare the output.
set.seed(1) # Set random seed to make pseudorandom results reproducible
adonis2(comm ~ Year * Site, data = tran, method = "bray", by = "terms")
set.seed(1)
adonis2(comm ~ Site * Year, data = tran, method = "bray", by = "terms")
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = comm ~ Year * Site, data = tran, method = "bray", by = "terms")
## Df SumOfSqs R2 F Pr(>F)
## Year 1 0.0275 0.00665 1.8079 0.197
## Site 3 3.7732 0.91168 82.6195 0.001 ***
## Year:Site 3 0.0336 0.00811 0.7350 0.549
## Residual 20 0.3045 0.07356
## Total 27 4.1387 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = comm ~ Site * Year, data = tran, method = "bray", by = "terms")
## Df SumOfSqs R2 F Pr(>F)
## Site 3 3.7732 0.91168 82.6195 0.001 ***
## Year 1 0.0275 0.00665 1.8079 0.197
## Site:Year 3 0.0336 0.00811 0.7350 0.549
## Residual 20 0.3045 0.07356
## Total 27 4.1387 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In this case, both orderings give identical results, suggesting that there is no significant interaction, and that only the main effect of Site is significant.
To do posthoc pairwise testing, you can use
pairwise.adonis2() (or manually split up the data and
re-run adonis2()).
set.seed(1)
pairwise.adonis2(comm ~ Site, method = "bray", data = tran, by = "terms")
## $parent_call
## [1] "comm ~ Site , strata = Null , permutations 999"
##
## $RefE_vs_RefW
## Df SumOfSqs R2 F Pr(>F)
## Site 1 0.0166 0.0458 0.48 0.623
## Residual 10 0.3458 0.9542
## Total 11 0.3624 1.0000
##
## $RefE_vs_RestE
## Df SumOfSqs R2 F Pr(>F)
## Site 1 1.8408 0.908 118.44 0.003 **
## Residual 12 0.1865 0.092
## Total 13 2.0273 1.000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## $RefE_vs_RestW
## Df SumOfSqs R2 F Pr(>F)
## Site 1 1.8109 0.90735 117.53 0.002 **
## Residual 12 0.1849 0.09265
## Total 13 1.9958 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## $RefW_vs_RestE
## Df SumOfSqs R2 F Pr(>F)
## Site 1 1.95679 0.91548 129.98 0.001 ***
## Residual 12 0.18065 0.08452
## Total 13 2.13744 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## $RefW_vs_RestW
## Df SumOfSqs R2 F Pr(>F)
## Site 1 1.92148 0.91476 128.78 0.002 **
## Residual 12 0.17905 0.08524
## Total 13 2.10052 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## $RestE_vs_RestW
## Df SumOfSqs R2 F Pr(>F)
## Site 1 0.0018577 0.08598 1.317 0.272
## Residual 14 0.0197475 0.91402
## Total 15 0.0216052 1.00000
##
## attr(,"class")
## [1] "pwadstrata" "list"
The p-values from pairwise.adonis2() are
not adjusted for multiple testing. To adjust them, we can use the
p.adjust() function.
pval <- c(0.623, 0.003, 0.002, 0.001, 0.002, 0.272)
p.adjust(pval, method = "BH") # Use Benjamini-Hochberg adjustment
## [1] 0.6230 0.0045 0.0040 0.0040 0.0040 0.3264
After adjusting for multiple comparisons, all pairwise comparisons among sites are significant, except for those within site types (i.e., RefE_vs_RefW and RestE_vs_RestW).
To assess the contribution of each species to the average Bray-Curtis dissimilarity between site types (Reference vs. Restoration), we will use SIMPER.
(simp <- with(tran, simper(comm, Sitetype)))
## cumulative contributions of most influential species:
##
## $Reference_Restoration
## Spartina.patens Spartina.alterniflora
## 0.6026863 0.8370458
summary(simp)
##
## Contrast: Reference_Restoration
##
## average sd ratio ava avb cumsum p
## Spartina.patens 0.44830 0.05903 7.59400 3.97400 0.01720 0.603 0.001 ***
## Spartina.alterniflora 0.17430 0.07029 2.48000 1.06700 2.60300 0.837 0.001 ***
## Triglochin.maritima 0.05190 0.03768 1.37700 0.48500 0.00000 0.907 0.001 ***
## Glaux.maritima 0.02350 0.02205 1.06700 0.21800 0.00000 0.938 0.001 ***
## Atriplex.spp. 0.01220 0.02077 0.58500 0.11300 0.00420 0.955 0.003 **
## Limonium.carolinianum 0.00870 0.01241 0.70200 0.08500 0.00000 0.967 0.001 ***
## Solidago.simpervirens 0.00790 0.00914 0.85900 0.06700 0.00000 0.977 0.001 ***
## Juncus.gerardii 0.00430 0.00987 0.43400 0.04400 0.00000 0.983 0.001 ***
## Plantago.maritima 0.00410 0.00714 0.57100 0.03700 0.00000 0.988 0.001 ***
## Salicornia.sp. 0.00360 0.00645 0.55700 0.00600 0.02920 0.993 1.000
## Suaeda.spp. 0.00230 0.00411 0.55500 0.00000 0.02010 0.996 1.000
## Poaceae..other. 0.00180 0.00411 0.42700 0.01500 0.00000 0.999 0.001 ***
## Polygonum.sp. 0.00060 0.00239 0.25600 0.00000 0.00550 0.999 1.000
## Spergularia.canadensis 0.00050 0.00182 0.25600 0.00000 0.00420 1.000 1.000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
The two species that contribute most to differences among site types are Spartina spp.; however, most taxa contribute significantly.
Significant PERMANOVA results can be driven by differences in dispersion, not just differences in group centroids. So, to test the assumption of multivariate homogeneity of group dispersions (variances), we will use PERMDISP2, which is a multivariate analogue of Levene’s test for homogeneity of variances. If \(p>0.05\), we cannot reject the null hypothesis of homogeneous dispersion.
dist_matrix <- vegdist(comm, method = "bray")
dispersion <- betadisper(dist_matrix, tran$Site)
permutest(dispersion)
boxplot(dispersion)
##
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
##
## Response: Distances
## Df Sum Sq Mean Sq F N.Perm Pr(>F)
## Groups 3 0.094378 0.0314594 6.0124 999 0.001 ***
## Residuals 24 0.125577 0.0052324
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The null hypothesis of homogeneity of variances is violated: there appears to be much more variance in reference sites than in restoration sites. We will examine this further with an nMDS plot.
To plot the community data, we will use nMDS (non-metric multidimensional scaling), an ordination technique that represents samples in a low-dimensional space while trying to preserve the rank order of their pairwise dissimilarities. An nMDS is fitted iteratively using numerical optimization, adjusting the configuration of points to minimize a stress function (unlike a PCA, which is obtained algebraically).
set.seed(1)
(nmds <- metaMDS(comm, distance = "bray"))
##
## Call:
## metaMDS(comm = comm, distance = "bray", trace = 0)
##
## global Multidimensional Scaling using monoMDS
##
## Data: comm
## Distance: bray
##
## Dimensions: 2
## Stress: 0.0125116
## Stress type 1, weak ties
## Best solution was repeated 4 times in 20 tries
## The best solution was from try 3 (random start)
## Scaling: centring, PC rotation, halfchange scaling
## Species: expanded scores based on 'comm'
Stress is a measure of how well the low-dimensional ordination
preserves the rank order of dissimilarities among samples. According to
Quinn & Keough (2024), nMDS plots should only be interpreted if
stress is <0.2, and ideally it should be <0.15. Stress > 0.3
indicates that the fit is no better than arbitrary. Stress can be
reduced by adding more dimensions (i.e., k=3 instead of the
default k=2). Our stress value of 0.013 is very good.
Next, we will determine how each of the nMDS scores (i.e., the axes on the nMDS plot we will produce) explains the (transformed and aggregated) species counts for each transect.
(fit <- envfit(nmds, comm))
##
## ***VECTORS
##
## NMDS1 NMDS2 r2 Pr(>r)
## Spartina.patens -0.99135 -0.13124 0.9875 0.001 ***
## Spartina.alterniflora 0.92944 0.36898 0.9395 0.001 ***
## Plantago.maritima -0.30130 -0.95353 0.2568 0.026 *
## Triglochin.maritima -0.19466 -0.98087 0.9046 0.001 ***
## Glaux.maritima -0.20071 -0.97965 0.7659 0.001 ***
## Atriplex.spp. -0.39476 0.91878 0.1755 0.101
## Salicornia.sp. 0.46508 0.88527 0.0695 0.388
## Suaeda.spp. 0.36124 -0.93247 0.1276 0.149
## Limonium.carolinianum -0.11165 -0.99375 0.6883 0.001 ***
## Solidago.simpervirens -0.24357 0.96988 0.5753 0.001 ***
## Juncus.gerardii -0.08001 -0.99679 0.6106 0.002 **
## Polygonum.sp. 0.31637 0.94864 0.0327 0.514
## Spergularia.canadensis 0.31637 0.94864 0.0327 0.514
## Poaceae..other. -0.21336 0.97697 0.1443 0.152
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
Finally, we will create the nMDS plot with species vectors.
# Set plotting symbols
site_type <- ifelse(grepl("^Ref", tran$Site), "Ref", "Rest")
side <- ifelse(grepl("E$", tran$Site), "E", "W")
type_col <- c("Ref" = "forestgreen",
"Rest" = "magenta3")
pch_vals <- c("E" = 21, # circle
"W" = 24) # triangle
pt_col <- type_col[site_type] # border colour = type
pt_bg <- ifelse(tran$Year == "2018", pt_col, "white") # filled vs open
# Create the nMDS plot and add the points for each site
plot(nmds, display = "sites", type = "n", main = NULL)
points(nmds,
display = "sites",
pch = pch_vals[side],
col = pt_col,
bg = pt_bg)
# Optionally, add a minimum spanning tree, ordiellipses, and/or an ordispider
#mst <- spantree(dist_matrix)
#lines(mst, ord = nmds, col = adjustcolor("grey", alpha = 0.4))
#ordiellipse(nmds, groups=tran$Sitetype, kind = "sd", draw = "polygon", lwd = 1.5,
# border = type_col, col = adjustcolor(type_col, alpha.f = 0.15))
#ordispider(nmds, groups=tran$Sitetype, lwd = 0.7, col = adjustcolor(type_col, alpha.f = 0.3))
# Add species vectors from envfit()
plot(fit, p.max = 0.05, col = "blue", cex=0.5)
# Create the legend
legend_df <- expand.grid(Year = c("2018", "2019"),
SiteType = c("Ref", "Rest"),
Side = c("E", "W"),
stringsAsFactors = FALSE)
legend_df$label <- paste(legend_df$SiteType, legend_df$Side, legend_df$Year)
legend_df$pch <- c("E" = 21, "W" = 24)[legend_df$Side]
legend_df$col <- c("Ref" = "forestgreen", "Rest" = "magenta3")[legend_df$SiteType]
legend_df$pt.bg <- ifelse(legend_df$Year == "2019", legend_df$col, "white")
legend("bottomright",
legend = legend_df$label,
pch = legend_df$pch,
col = legend_df$col,
pt.bg = legend_df$pt.bg,
pt.cex = 0.5,
cex = 0.5,
title = "Site-Year",
bty = "n")
The nMDS plot shows that reference and restoration sites are clearly separated along NMDS1. There is little evidence of a year effect or a site-year interaction. Reference sites appear more variable than restored sites, indicating greater within-group dispersion.