NMDS Ordination

Before you begin

Save data as csv files (these are easy for R to work with)
Set working directory: this is the folder that R will look in when searching for files (Session, Set working directory, Choose directory… OR Control+Shift+H for windows Command+Shift+H for mac)
Install or load required packages: if you get an error Error in library(vegan) : there is no package called vegan it is because the package is not installed. Use install.packages("vegan") and then run library(vegan) again.

library(vegan) # for vegetation and community analysis
library(dplyr) # for data manipulation

## Warning: package 'dplyr' was built under R version 3.4.4

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

It is ok to see these warnings! R is just telling you that it has read the packages correctly but they were built in different versions of R than the one you are currently using, it will not effect the function of the package!

Import and transform data

** The data used for this example is part of the vegan package and does not need to be imported or transformed**

Before you can do any sort of ordination analysis you will need to 1. Import data from csv file 2. Transform your data from long format (multiple lines for each site) into wide format (single row for each site, species as columns)

dune <- read.csv(file = "dune.csv", header = TRUE, row.names = 1) # header = TRUE for data with column names, row.names = 1 for data where the site name is in the first column
dune.env <- read.csv(file = "dune_env.csv", header = T, row.names = 1)

#alternatively for this example you can import using
data(dune)
data("dune.env")

The data used in this example doesn’t need transformation as it is already in wide format but this is how you can do it.

After reading in the raw data, there is many rows with data for the same species but from different samples or sites, thsi needs to be transformed it into wide format. To do that you need to create a matrix that is as long as the number of samples and as wide as the number of species observed.

taxa <- unique(data$Species) #creates list of each unique species
samples <- sort(unique(data$Sample.ID)) #creates list of each unique site or sample

#make empty matrix "data1" ready to fill in
data1 <- matrix(nrow = length(samples), ncol = length(taxa), dimnames = list(samples,taxa))

for(r in 1:nrow(LL.data)){
  samp <- LL.data[r, 1]
  tax <- LL.data[r, 2]
  data1[samp,tax] <- LL.data[r, 3]
} # 1, 2, 3 here relate the the column number in the raw data in which the sample name, species name and data are in

data1[is.na(data1)] <- 0   #convert NA's to 0

Now we have a dataset that is in wide format where each sample is its own row and each species is its own column but we need to save it as a data frame to do anyfurther analysis.

data <- as.data.frame(data1)

Ordinaton

Now that the data is in a format that is suitable for ordination methods you can use the metaMDS function from the vegan package to run the NMDS and envfit to identify species or environmental variables which are driving the pattern.

dune.mds <- metaMDS(dune, distance = "bray", autotransform = FALSE)

## Run 0 stress 0.1192678 
## Run 1 stress 0.1192679 
## ... Procrustes: rmse 0.0001229708  max resid 0.0003715247 
## ... Similar to previous best
## Run 2 stress 0.1192679 
## ... Procrustes: rmse 7.501385e-05  max resid 0.0002322669 
## ... Similar to previous best
## Run 3 stress 0.1183186 
## ... New best solution
## ... Procrustes: rmse 0.02026874  max resid 0.06494965 
## Run 4 stress 0.1183186 
## ... Procrustes: rmse 6.995072e-05  max resid 0.0002238919 
## ... Similar to previous best
## Run 5 stress 0.2075713 
## Run 6 stress 0.1183186 
## ... Procrustes: rmse 7.751641e-06  max resid 2.498121e-05 
## ... Similar to previous best
## Run 7 stress 0.1192679 
## Run 8 stress 0.119268 
## Run 9 stress 0.1901493 
## Run 10 stress 0.119268 
## Run 11 stress 0.1183186 
## ... Procrustes: rmse 3.382625e-05  max resid 0.0001106193 
## ... Similar to previous best
## Run 12 stress 0.1183186 
## ... Procrustes: rmse 2.809714e-05  max resid 8.092168e-05 
## ... Similar to previous best
## Run 13 stress 0.1808911 
## Run 14 stress 0.1192679 
## Run 15 stress 0.1192678 
## Run 16 stress 0.1183186 
## ... Procrustes: rmse 3.085071e-05  max resid 7.09679e-05 
## ... Similar to previous best
## Run 17 stress 0.119268 
## Run 18 stress 0.1183186 
## ... Procrustes: rmse 5.956215e-05  max resid 0.0001992485 
## ... Similar to previous best
## Run 19 stress 0.1183186 
## ... New best solution
## ... Procrustes: rmse 8.709955e-06  max resid 2.963027e-05 
## ... Similar to previous best
## Run 20 stress 0.1183187 
## ... Procrustes: rmse 5.967417e-05  max resid 0.0001818531 
## ... Similar to previous best
## *** Solution reached

dune.mds # make note of the stress value, this shows how easy it was to condense multidimensional data into two dimensional space, below 0.2 is generally good

## 
## Call:
## metaMDS(comm = dune, distance = "bray", autotransform = FALSE) 
## 
## global Multidimensional Scaling using monoMDS
## 
## Data:     dune 
## Distance: bray 
## 
## Dimensions: 2 
## Stress:     0.1183186 
## Stress type 1, weak ties
## Two convergent solutions found after 20 tries
## Scaling: centring, PC rotation, halfchange scaling 
## Species: expanded scores based on 'dune'

Ploting ordinations

You can use base R plotting functions to graph ordinations which are fine and fairly maniplative but you can also ggplotting functions which are a bit more fancy and more manipulative. Here there is code to do it either way

Using base R plotting

The dune dataset has two levels of categorisation; management option and land use type. I want to display both of these on one plot so I will use different symbols to display management options and different colours to display land use types.

plot(dune.mds) # displays sites and species

plot(dune.mds, type = "n") #displays empty ordination space
points(dune.mds, display = "sites", pch = c(16, 8, 17, 11) [as.numeric(dune.env$Management)], col = c("blue", "orange", "black") [as.numeric(dune.env$Use)]) # displays site points where symbols (pch) are different management options and colour (col) are different land uses
legend("topright", legend = c(levels(dune.env$Management), levels(dune.env$Use)), pch = c(16, 8, 17, 11, 16, 16, 16), col = c("black","black","black","black","blue", "orange", "black"), bty = "n", cex = 1) # displays symbol and colour legend
legend("topleft", "stress = 0.118", bty = "n", cex = 1) # displays legend text of stress value

You can also add ellipses or hulls.
Ordiplot is an alternative to the basic plot function but will do the same thing in this case.

ordiplot(dune.mds, type = "n", main = "ellipses")
orditorp(dune.mds, display = "sites", labels = F, pch = c(16, 8, 17, 18) [as.numeric(dune.env$Management)], col = c("green", "blue", "orange", "black") [as.numeric(dune.env$Management)], cex = 1)
ordiellipse(dune.mds, groups = dune.env$Management, draw = "polygon", lty = 1, col = "grey90")

ordiplot(dune.mds, type = "n", main = "hulls")
orditorp(dune.mds, display = "sites", labels = F, pch = c(16, 8, 17, 18) [as.numeric(dune.env$Management)], col = c("green", "blue", "orange", "black") [as.numeric(dune.env$Management)], cex = 1)
ordihull(dune.mds, groups = dune.env$Management, draw = "polygon", lty = 1, col = "grey90")

You can also investigate the species which may be driving the site distribution pattern, referred to as intrinsic variables.

dune.spp.fit <- envfit(dune.mds, dune, permutations = 999)
head(dune.spp.fit)

## $vectors
##             NMDS1    NMDS2     r2 Pr(>r)    
## Achimill -0.99991  0.01317 0.4160  0.010 ** 
## Agrostol  0.84322 -0.53758 0.7109  0.001 ***
## Airaprae -0.23061  0.97305 0.5034  0.005 ** 
## Alopgeni  0.40720 -0.91334 0.4044  0.010 ** 
## Anthodor -0.64679  0.76267 0.5430  0.004 ** 
## Bellpere -0.78420 -0.62051 0.1856  0.174    
## Bromhord -0.77617 -0.63052 0.2266  0.105    
## Chenalbu  0.40395 -0.91478 0.1018  0.434    
## Cirsarve -0.13497 -0.99085 0.0638  0.574    
## Comapalu  0.85099  0.52518 0.3082  0.045 *  
## Eleopalu  0.99024  0.13937 0.6264  0.002 ** 
## Elymrepe -0.39898 -0.91696 0.4414  0.008 ** 
## Empenigr -0.04253  0.99910 0.2335  0.103    
## Hyporadi -0.21341  0.97696 0.4810  0.003 ** 
## Juncarti  0.98403 -0.17802 0.3704  0.025 *  
## Juncbufo  0.27714 -0.96083 0.1739  0.200    
## Lolipere -0.80008 -0.59989 0.5552  0.001 ***
## Planlanc -0.87634  0.48170 0.3897  0.015 *  
## Poaprat  -0.70971 -0.70449 0.6187  0.002 ** 
## Poatriv  -0.23105 -0.97294 0.6088  0.001 ***
## Ranuflam  0.99697  0.07781 0.6625  0.002 ** 
## Rumeacet -0.93995 -0.34133 0.1108  0.359    
## Sagiproc  0.40002 -0.91651 0.0453  0.711    
## Salirepe  0.42430  0.90552 0.2732  0.068 .  
## Scorautu -0.44890  0.89358 0.3288  0.038 *  
## Trifprat -0.99608  0.08850 0.1175  0.349    
## Trifrepe -0.98731  0.15882 0.0160  0.857    
## Vicilath -0.55859  0.82944 0.1131  0.366    
## Bracruta  0.55368  0.83273 0.1145  0.373    
## Callcusp  0.94671  0.32207 0.5049  0.003 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
## 
## $factors
## NULL
## 
## $na.action
## function (object, ...) 
## UseMethod("na.action")
## <bytecode: 0x0000000006dfdd48>
## <environment: namespace:stats>

ordiplot(dune.mds, type = "n", main = "intrinsic species")
orditorp(dune.mds, display = "sites", labels = F, pch = c(16, 8, 17, 18) [as.numeric(dune.env$Management)], col = c("green", "blue", "orange", "black") [as.numeric(dune.env$Management)], cex = 1)
plot(dune.spp.fit, p.max = 0.01, col = "black", cex = 0.7) # change the significance level of species shown with p.max

Environmental variables can also be used with envfit which are referred to as extrinsic variables. This works best with continuous variables of which there is only one (A1) in this dataset.If you only want to fit vector variables (continuous variables) use vectorfit and if you only want to fit factor variables (categorical variables) use factorfit but envfit can do this automatically.

dune.envfit <- envfit(dune.mds, dune.env, permutations = 999)
head(dune.envfit)

## $vectors
##      NMDS1   NMDS2     r2 Pr(>r)  
## A1 0.96473 0.26322 0.3649  0.026 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
## 
## $factors
## Centroids:
##                NMDS1   NMDS2
## Moisture1    -0.5101 -0.0403
## Moisture2    -0.3938  0.0139
## Moisture4     0.2765 -0.4033
## Moisture5     0.6561  0.1476
## ManagementBF -0.4534 -0.0102
## ManagementHF -0.2636 -0.1282
## ManagementNM  0.2958  0.5790
## ManagementSF  0.1506 -0.4670
## UseHayfield  -0.1568  0.3248
## UseHaypastu  -0.0412 -0.3370
## UsePasture    0.2854  0.0844
## Manure0       0.2958  0.5790
## Manure1      -0.2482 -0.0215
## Manure2      -0.3079 -0.1866
## Manure3       0.3101 -0.2470
## Manure4      -0.3463 -0.5583
## 
## Goodness of fit:
##                r2 Pr(>r)   
## Moisture   0.5014  0.002 **
## Management 0.4134  0.007 **
## Use        0.1871  0.153   
## Manure     0.4247  0.017 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
## 
## $na.action
## function (object, ...) 
## UseMethod("na.action")
## <bytecode: 0x0000000006dfdd48>
## <environment: namespace:stats>

ordiplot(dune.mds, type = "n", main = "extrinsic variables")
orditorp(dune.mds, display = "sites", labels = F, pch = c(16, 8, 17, 18) [as.numeric(dune.env$Management)], col = c("green", "blue", "orange", "black") [as.numeric(dune.env$Management)], cex = 1)
plot(dune.envfit, col = "black", cex = 0.7)

For some more options and info on base R ordination plotting see this tutorial from Jon Lefcheck https://jonlefcheck.net/2012/10/24/nmds-tutorial-in-r/ or some more examples and community ecology analysis tips by David Zelený https://www.davidzeleny.net/anadat-r/doku.php/en:ordiagrams_examples

Plotting Using ggplot2

Although the base plotting functions in R are suitable for ordination plots, ggplot can provide some more functionality and different aesthetics more easily. ggplot does require that data for plotting is all saved as a data frame.

The following example is based off code provided by Christopher Chizinski but has been modified to use the same Dune dataset as above https://www.r-bloggers.com/reinventing-the-wheel-for-ordination-biplots-with-ggplot2/

For this example you need the ggplot2 and vegan packages

library(ggplot2) # for pretty plots
library(vegan)

Import data

data("dune")
data("dune.env")

The mds and envfit functions are run using the same method above

dune.mds <- metaMDS(dune, distance = "bray", autotransform = F)
dune.envfit <- envfit(dune.mds, dune.env, permutations = 999) # this fits environmental vectors
dune.spp.fit <- envfit(dune.mds, dune, permutations = 999) # this fits species vectors

To plot the output from the mds using ggplot a new datasheet needs to be created which contains the x,y points for each site. You can do this by calling the scores of you mds.

site.scrs <- as.data.frame(scores(dune.mds, display = "sites")) #save NMDS results into dataframe
site.scrs <- cbind(site.scrs, Management = dune.env$Management) #add grouping variable "Management" to dataframe
site.scrs <- cbind(site.scrs, Landuse = dune.env$Use) #add grouping variable of cluster grouping to dataframe
#site.scrs <- cbind(site.scrs, Site = rownames(site.scrs)) #add site names as variable if you want to display on plot

head(site.scrs)

##         NMDS1       NMDS2 Management  Landuse
## 1 -0.84051175 -0.71587030         SF Haypastu
## 2 -0.50485164 -0.40893973         BF Haypastu
## 3 -0.08266112 -0.43668223         SF Haypastu
## 4 -0.11561563 -0.52223983         SF Haypastu
## 5 -0.62655111 -0.08670437         HF Hayfield
## 6 -0.54270296  0.11315055         HF Haypastu

A new dataset containing species data also needs to be made to look at species vectors.
This is not necessary if you don’t want to show the species on the final graph.

spp.scrs <- as.data.frame(scores(dune.spp.fit, display = "vectors")) #save species intrinsic values into dataframe
spp.scrs <- cbind(spp.scrs, Species = rownames(spp.scrs)) #add species names to dataframe
spp.scrs <- cbind(spp.scrs, pval = dune.spp.fit$vectors$pvals) #add pvalues to dataframe so you can select species which are significant
#spp.scrs<- cbind(spp.scrs, abrev = abbreviate(spp.scrs$Species, minlength = 6)) #abbreviate species names
sig.spp.scrs <- subset(spp.scrs, pval<=0.05) #subset data to show species significant at 0.05

head(spp.scrs)

##               NMDS1        NMDS2  Species  pval
## Achimill -0.6448960  0.008496538 Achimill 0.010
## Agrostol  0.7109566 -0.453256126 Agrostol 0.001
## Airaprae -0.1636148  0.690374853 Airaprae 0.005
## Alopgeni  0.2589360 -0.580794776 Alopgeni 0.010
## Anthodor -0.4766061  0.561991023 Anthodor 0.004
## Bellpere -0.3378215 -0.267308503 Bellpere 0.174

To show environmental extrinsic variables another datasheet needs to be created

env.scores.dune <- as.data.frame(scores(dune.envfit, display = "vectors")) #extracts relevant scores from envifit
env.scores.dune <- cbind(env.scores.dune, env.variables = rownames(env.scores.dune)) #and then gives them their names

env.scores.dune <- cbind(env.scores.dune, pval = dune.envfit$vectors$pvals) # add pvalues to dataframe
sig.env.scrs <- subset(env.scores.dune, pval<=0.05) #subset data to show variables significant at 0.05

head(env.scores.dune)

##        NMDS1     NMDS2 env.variables  pval
## A1 0.5828059 0.1590172            A1 0.026

Now we have the relevant information for plotting the ordination in ggplot! Lets get plotting!

nmds.plot.dune <- ggplot(site.scrs, aes(x=NMDS1, y=NMDS2))+ #sets up the plot
  geom_point(aes(NMDS1, NMDS2, colour = factor(site.scrs$Management), shape = factor(site.scrs$Landuse)), size = 2)+ #adds site points to plot, shape determined by Landuse, colour determined by Management
  coord_fixed()+
  theme_classic()+ 
  theme(panel.background = element_rect(fill = NA, colour = "black", size = 1, linetype = "solid"))+
  labs(colour = "Management", shape = "Landuse")+ # add legend labels for Management and Landuse
  theme(legend.position = "right", legend.text = element_text(size = 12), legend.title = element_text(size = 12), axis.text = element_text(size = 10)) # add legend at right of plot

nmds.plot.dune + labs(title = "Basic ordination plot") #displays plot

The thing with ggplot is that you can add many layers onto the same plot using different geom_ options as well as manipulating every graphical feature and graph space.
Here are some other options which could be added to the ordination plot.

Significant Species

nmds.plot.dune+
  geom_segment(data = sig.spp.scrs, aes(x = 0, xend=NMDS1, y=0, yend=NMDS2), arrow = arrow(length = unit(0.25, "cm")), colour = "grey10", lwd=0.3) + #add vector arrows of significant species
  ggrepel::geom_text_repel(data = sig.spp.scrs, aes(x=NMDS1, y=NMDS2, label = Species), cex = 3, direction = "both", segment.size = 0.25)+ #add labels for species, use ggrepel::geom_text_repel so that labels do not overlap
  labs(title = "Ordination with species vectors")

Significant Environmental Variables

nmds.plot.dune+
  geom_segment(data = sig.env.scrs, aes(x = 0, xend=NMDS1, y=0, yend=NMDS2), arrow = arrow(length = unit(0.25, "cm")), colour = "grey10", lwd=0.3) + #add vector arrows of significant env variables
  ggrepel::geom_text_repel(data = sig.env.scrs, aes(x=NMDS1, y=NMDS2, label = env.variables), cex = 4, direction = "both", segment.size = 0.25)+ #add labels for env variables
  labs(title="Ordination with environmental vectors")

Ellipses

taken from the excellent stackoverflow Q+A: http://stackoverflow.com/questions/13794419/plotting-ordiellipse-function-from-vegan-package-onto-nmds-plot-created-in-ggplo

# function for ellipsess 

veganCovEllipse <- function (cov, center = c(0, 0), scale = 1, npoints = 100) 
{
  theta <- (0:npoints) * 2 * pi/npoints
  Circle <- cbind(cos(theta), sin(theta))
  t(center + scale * t(Circle %*% chol(cov)))
}
 
#data for ellipse, in this case using the management factor
df_ell.dune.management <- data.frame() #sets up a data frame before running the function.
for(g in levels(site.scrs$Management)){
  df_ell.dune.management <- rbind(df_ell.dune.management, cbind(as.data.frame(with(site.scrs [site.scrs$Management==g,],
                                                         veganCovEllipse(cov.wt(cbind(NMDS1,NMDS2),wt=rep(1/length(NMDS1),length(NMDS1)))$cov,center=c(mean(NMDS1),mean(NMDS2))))) ,Management=g))
}
 
# data for labelling the ellipse
NMDS.mean.dune=aggregate(site.scrs[ ,c("NMDS1", "NMDS2")], 
                         list(group = site.scrs$Management), mean)
 
# data for labelling the ellipse
NMDS.mean=aggregate(site.scrs[,c("NMDS1", "NMDS2")], 
                    list(group = site.scrs$Management), mean)

nmds.plot.dune+ 
geom_path(data = df_ell.dune.management, aes(x = NMDS1, y = NMDS2, group = Management)) #this is the ellipse, seperate ones by Site.

hulls

See Chris Chizinski’s worked example to see how to create data and add hulls to ordination plot

https://www.r-bloggers.com/reinventing-the-wheel-for-ordination-biplots-with-ggplot2/

Also see Chris’ worked example for ordisurf to add information about continuous variables to the ordination plot

https://chrischizinski.github.io/rstats/ordisurf/

Extras

For some extra tips and tricks with ggplot have a look at this cheat sheet from RStudio https://rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf

As with anything in R there are many ways to create ggplot ordinations.
Here is another example using the ggord package from Marcus Beck which seems pretty stright forward as well and maybe doesn’t require as much fidling with data as this example https://www.r-bloggers.com/reinventing-the-wheel-for-ordination-biplots-with-ggplot2/

Another example has been taken from Olivia Burge and uses the same Dune dataset as above https://oliviarata.wordpress.com/2014/04/17/ordinations-in-ggplot2/

Session Info

sessionInfo()

## R version 3.4.3 (2017-11-30)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 14393)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252   
## [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
## [5] LC_TIME=English_Australia.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] dplyr_0.8.0.1   ggplot2_3.2.1   vegan_2.5-4     lattice_0.20-38
## [5] permute_0.9-5  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.1       compiler_3.4.3   pillar_1.4.2     tools_3.4.3     
##  [5] digest_0.6.18    evaluate_0.14    tibble_2.1.1     gtable_0.3.0    
##  [9] nlme_3.1-137     mgcv_1.8-28      pkgconfig_2.0.3  rlang_0.3.4     
## [13] Matrix_1.2-17    ggrepel_0.8.0    yaml_2.2.0       parallel_3.4.3  
## [17] xfun_0.10        withr_2.1.2      stringr_1.4.0    cluster_2.0.8   
## [21] knitr_1.25       grid_3.4.3       tidyselect_0.2.5 glue_1.3.1      
## [25] R6_2.4.0         rmarkdown_1.16   purrr_0.3.2      magrittr_1.5    
## [29] scales_1.0.0     htmltools_0.3.6  MASS_7.3-51.4    splines_3.4.3   
## [33] assertthat_0.2.1 colorspace_1.4-1 labeling_0.3     stringi_1.4.3   
## [37] lazyeval_0.2.2   munsell_0.5.0    crayon_1.3.4