Before you begin

Save data as csv files (these are easy for R to work with)
Set working directory: this is the folder that R will look in when searching for files (Session, Set working directory, Choose directory… OR Control+Shift+H for windows Command+Shift+H for mac)
Install or load required packages: if you get an error Error in library(vegan) : there is no package called ‘vegan’ it is because the package is not installed. Use install.packages("vegan") and then run library(vegan) again.

library(vegan) # for vegetation and community analysis
library(dplyr) # for data manipulation

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

It is ok to see these warnings! R is just telling you that it has read the packages correctly but they were built in different versions of R than the one you are currently using, it will not effect the function of the package!

Import and transform data

The data used for this example is part of the vegan package and does not need to be imported or transformed

Before you can do any sort of ordination analysis you will need to 1. Import data from csv file 2. Transform your data from long format (multiple lines for each site) into wide format (single row for each site, species as columns)

dune <- read.csv(file = "dune.csv", header = TRUE, row.names = 1) # header = TRUE for data with column names, row.names = 1 for data where the site name is in the first column
dune.env <- read.csv(file = "dune_env.csv", header = T, row.names = 1)

#alternatively for this example you can import using
data("dune")
data("dune.env")

The data used in this example doesn’t need transformation as it is already in wide format but this is how you can do it.

After reading in the raw data, there is many rows with data for the same species but from different samples or sites, thsi needs to be transformed it into wide format. To do that you need to create a matrix that is as long as the number of samples and as wide as the number of species observed.

taxa <- unique(data$Species) #creates list of each unique species
samples <- sort(unique(data$Sample.ID)) #creates list of each unique site or sample

#make empty matrix "data1" ready to fill in
data1 <- matrix(nrow = length(samples), ncol = length(taxa), dimnames = list(samples,taxa))

for(r in 1:nrow(LL.data)){
  samp <- LL.data[r, 1]
  tax <- LL.data[r, 2]
  data1[samp,tax] <- LL.data[r, 3]
} # 1, 2, 3 here relate the the column number in the raw data in which the sample name, species name and data are in

data1[is.na(data1)] <- 0   #convert NA's to 0

Now we have a dataset that is in wide format where each sample is its own row and each species is its own column but we need to save it as a data frame to do anyfurther analysis.

data <- as.data.frame(data1)

Ordinaton

Now that the data is in a format that is suitable for ordination methods you can use the metaMDS function from the vegan package to run the NMDS and envfit to identify species or environmental variables which are driving the pattern.

dune.mds <- metaMDS(dune, distance = "bray", autotransform = FALSE)

## Run 0 stress 0.1192678 
## Run 1 stress 0.119268 
## ... Procrustes: rmse 0.0002513026  max resid 0.0007733967 
## ... Similar to previous best
## Run 2 stress 0.1192681 
## ... Procrustes: rmse 0.0002405236  max resid 0.000728375 
## ... Similar to previous best
## Run 3 stress 0.1809577 
## Run 4 stress 0.1889648 
## Run 5 stress 0.1183186 
## ... New best solution
## ... Procrustes: rmse 0.02026976  max resid 0.06495745 
## Run 6 stress 0.1183186 
## ... Procrustes: rmse 7.669336e-05  max resid 0.0002139572 
## ... Similar to previous best
## Run 7 stress 0.1183186 
## ... New best solution
## ... Procrustes: rmse 1.081799e-05  max resid 2.069783e-05 
## ... Similar to previous best
## Run 8 stress 0.1192679 
## Run 9 stress 0.1183186 
## ... Procrustes: rmse 1.441451e-05  max resid 4.489045e-05 
## ... Similar to previous best
## Run 10 stress 0.1889677 
## Run 11 stress 0.1183186 
## ... Procrustes: rmse 3.107946e-05  max resid 0.0001067073 
## ... Similar to previous best
## Run 12 stress 0.1192679 
## Run 13 stress 0.1192679 
## Run 14 stress 0.2834864 
## Run 15 stress 0.1183186 
## ... Procrustes: rmse 7.978353e-06  max resid 1.99185e-05 
## ... Similar to previous best
## Run 16 stress 0.1183186 
## ... Procrustes: rmse 5.056014e-06  max resid 1.108435e-05 
## ... Similar to previous best
## Run 17 stress 0.1183186 
## ... Procrustes: rmse 5.995923e-06  max resid 1.740042e-05 
## ... Similar to previous best
## Run 18 stress 0.1808917 
## Run 19 stress 0.119268 
## Run 20 stress 0.1183186 
## ... Procrustes: rmse 2.456578e-05  max resid 7.888567e-05 
## ... Similar to previous best
## *** Solution reached

dune.mds # make note of the stress value, this shows how easy it was to condense multidimensional data into two dimensional space, below 0.2 is generally good

## 
## Call:
## metaMDS(comm = dune, distance = "bray", autotransform = FALSE) 
## 
## global Multidimensional Scaling using monoMDS
## 
## Data:     dune 
## Distance: bray 
## 
## Dimensions: 2 
## Stress:     0.1183186 
## Stress type 1, weak ties
## Two convergent solutions found after 20 tries
## Scaling: centring, PC rotation, halfchange scaling 
## Species: expanded scores based on 'dune'

Ploting ordinations

You can use base R plotting functions to graph ordinations which are fine and fairly maniplative but you can also ggplotting functions which are a bit more fancy and more manipulative. Here there is code to do it either way

Using base R plotting

plot(dune.mds) # displays sites and species

plot(dune.mds, type = "n") #displays empty ordination space
points(dune.mds, display = "sites", pch = c(16, 8, 17, 18) [as.numeric(dune.env$Management)], col = c("green", "blue", "orange", "black") [as.numeric(dune.env$Management)]) # displays site points where symbols (pch) and colour (col) are different management options
legend("topright", legend = levels(dune.env$Management), pch = c(16, 8, 17, 18), col = c("green", "blue", "orange", "black"), bty = "n", cex = 1) # displays symbol and colour legend
legend("topleft", "stress = 0.118", bty = "n", cex = 1) # displays legend text of stress value

You can also add ellipses or hulls. Ordiplot is an alternative to the basic plot function but will do the same thing in this case.

ordiplot(dune.mds, type = "n", main = "ellipses")
orditorp(dune.mds, display = "sites", labels = F, pch = c(16, 8, 17, 18) [as.numeric(dune.env$Management)], col = c("green", "blue", "orange", "black") [as.numeric(dune.env$Management)], cex = 1)
ordiellipse(dune.mds, groups = dune.env$Management, draw = "polygon", lty = 1, col = "grey90")

ordiplot(dune.mds, type = "n", main = "hulls")
orditorp(dune.mds, display = "sites", labels = F, pch = c(16, 8, 17, 18) [as.numeric(dune.env$Management)], col = c("green", "blue", "orange", "black") [as.numeric(dune.env$Management)], cex = 1)
ordihull(dune.mds, groups = dune.env$Management, draw = "polygon", lty = 1, col = "grey90")

You can also investigate the species which may be driving the site distribution pattern, referred to as intrinsic variables.

intrinsics <- envfit(dune.mds, dune, permutations = 999)
head(intrinsics)

## $vectors
##             NMDS1    NMDS2     r2 Pr(>r)    
## Achimill -0.99991  0.01321 0.4160  0.011 *  
## Agrostol  0.84320 -0.53760 0.7109  0.001 ***
## Airaprae -0.23058  0.97305 0.5034  0.007 ** 
## Alopgeni  0.40718 -0.91335 0.4044  0.011 *  
## Anthodor -0.64677  0.76268 0.5430  0.005 ** 
## Bellpere -0.78421 -0.62050 0.1856  0.164    
## Bromhord -0.77618 -0.63051 0.2266  0.096 .  
## Chenalbu  0.40393 -0.91479 0.1018  0.392    
## Cirsarve -0.13500 -0.99085 0.0638  0.573    
## Comapalu  0.85100  0.52517 0.3082  0.038 *  
## Eleopalu  0.99024  0.13935 0.6264  0.001 ***
## Elymrepe -0.39901 -0.91695 0.4414  0.012 *  
## Empenigr -0.04252  0.99910 0.2335  0.100 .  
## Hyporadi -0.21339  0.97697 0.4810  0.002 ** 
## Juncarti  0.98402 -0.17805 0.3704  0.015 *  
## Juncbufo  0.27712 -0.96084 0.1739  0.186    
## Lolipere -0.80010 -0.59986 0.5552  0.002 ** 
## Planlanc -0.87633  0.48171 0.3897  0.017 *  
## Poaprat  -0.70973 -0.70447 0.6187  0.001 ***
## Poatriv  -0.23106 -0.97294 0.6088  0.002 ** 
## Ranuflam  0.99697  0.07778 0.6625  0.001 ***
## Rumeacet -0.93994 -0.34134 0.1108  0.368    
## Sagiproc  0.39997 -0.91653 0.0453  0.712    
## Salirepe  0.42432  0.90551 0.2732  0.061 .  
## Scorautu -0.44890  0.89358 0.3288  0.029 *  
## Trifprat -0.99608  0.08850 0.1175  0.351    
## Trifrepe -0.98732  0.15871 0.0160  0.870    
## Vicilath -0.55858  0.82945 0.1131  0.347    
## Bracruta  0.55370  0.83272 0.1145  0.366    
## Callcusp  0.94672  0.32206 0.5049  0.004 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
## 
## $factors
## NULL
## 
## $na.action
## function (object, ...) 
## UseMethod("na.action")
## <bytecode: 0x7fd5ca8d54d0>
## <environment: namespace:stats>

ordiplot(dune.mds, type = "n", main = "intrinsic species")
orditorp(dune.mds, display = "sites", labels = F, pch = c(16, 8, 17, 18) [as.numeric(dune.env$Management)], col = c("green", "blue", "orange", "black") [as.numeric(dune.env$Management)], cex = 1)
plot(intrinsics, p.max = 0.01, col = "black", cex = 0.7) # change the significance level of species shown with p.max

Environmental variables can also be used with envfit which are referred to as extrinsic variables. This works best with continuous variables of which there is only one (A1) in this dataset.If you only want to fit vector variables (continuous variables) use vectorfit and if you only want to fit factor variables (categorical variables) use factorfit.

extrinsics <- envfit(dune.mds, dune.env, permutations = 999)
head(extrinsics)

## $vectors
##      NMDS1   NMDS2    r2 Pr(>r)  
## A1 0.96474 0.26320 0.365   0.02 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
## 
## $factors
## Centroids:
##                NMDS1   NMDS2
## Moisture1    -0.5101 -0.0403
## Moisture2    -0.3938  0.0139
## Moisture4     0.2765 -0.4033
## Moisture5     0.6561  0.1476
## ManagementBF -0.4534 -0.0102
## ManagementHF -0.2636 -0.1282
## ManagementNM  0.2958  0.5790
## ManagementSF  0.1506 -0.4670
## UseHayfield  -0.1568  0.3248
## UseHaypastu  -0.0412 -0.3370
## UsePasture    0.2854  0.0844
## Manure0       0.2958  0.5790
## Manure1      -0.2482 -0.0215
## Manure2      -0.3079 -0.1866
## Manure3       0.3101 -0.2470
## Manure4      -0.3463 -0.5582
## 
## Goodness of fit:
##                r2 Pr(>r)    
## Moisture   0.5014  0.001 ***
## Management 0.4134  0.005 ** 
## Use        0.1871  0.112    
## Manure     0.4247  0.023 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
## 
## $na.action
## function (object, ...) 
## UseMethod("na.action")
## <bytecode: 0x7fd5ca8d54d0>
## <environment: namespace:stats>

ordiplot(dune.mds, type = "n", main = "extrinsic variables")
orditorp(dune.mds, display = "sites", labels = F, pch = c(16, 8, 17, 18) [as.numeric(dune.env$Management)], col = c("green", "blue", "orange", "black") [as.numeric(dune.env$Management)], cex = 1)
plot(extrinsics, col = "black", cex = 0.7)

For some more options and info on base R ordination plotting see this tutorial from Jon Lefcheck https://jonlefcheck.net/2012/10/24/nmds-tutorial-in-r/ or some more examples and community ecology analysis tips by David Zelený https://www.davidzeleny.net/anadat-r/doku.php/en:ordiagrams_examples

Plotting Using ggplot2

The following example has been taken from Olivia Burge and uses the same Dune dataset as above https://oliviarata.wordpress.com/2014/04/17/ordinations-in-ggplot2/

For this example you need the ggplot2 and grid packages

library(ggplot2) # for pretty plots
library(grid) # for envfit arrows on ordination plot

Import data

data("dune")
data("dune.env")

The mds and envfit functions are run using the same method above

meta.mds.dune <- metaMDS(dune, distance = "bray", autotransform = F)

## Run 0 stress 0.1192678 
## Run 1 stress 0.1183186 
## ... New best solution
## ... Procrustes: rmse 0.02027283  max resid 0.06496793 
## Run 2 stress 0.1183186 
## ... Procrustes: rmse 0.0001033504  max resid 0.0003055187 
## ... Similar to previous best
## Run 3 stress 0.1183186 
## ... New best solution
## ... Procrustes: rmse 3.317876e-05  max resid 0.0001073125 
## ... Similar to previous best
## Run 4 stress 0.1183186 
## ... Procrustes: rmse 2.347595e-05  max resid 5.498587e-05 
## ... Similar to previous best
## Run 5 stress 0.1192679 
## Run 6 stress 0.1183186 
## ... Procrustes: rmse 1.61948e-05  max resid 5.197715e-05 
## ... Similar to previous best
## Run 7 stress 0.1192679 
## Run 8 stress 0.1192679 
## Run 9 stress 0.1183186 
## ... New best solution
## ... Procrustes: rmse 8.927676e-06  max resid 2.969129e-05 
## ... Similar to previous best
## Run 10 stress 0.1192678 
## Run 11 stress 0.1183186 
## ... Procrustes: rmse 2.779054e-05  max resid 9.468145e-05 
## ... Similar to previous best
## Run 12 stress 0.1192678 
## Run 13 stress 0.3641834 
## Run 14 stress 0.1192678 
## Run 15 stress 0.1183186 
## ... Procrustes: rmse 3.358864e-06  max resid 9.976996e-06 
## ... Similar to previous best
## Run 16 stress 0.1192684 
## Run 17 stress 0.1192679 
## Run 18 stress 0.1192679 
## Run 19 stress 0.119268 
## Run 20 stress 0.1183186 
## ... New best solution
## ... Procrustes: rmse 2.637718e-06  max resid 4.156931e-06 
## ... Similar to previous best
## *** Solution reached

dune.envfit <- envfit(meta.mds.dune, dune.env, permutations = 999)

To plot the output from the mds a new datasheet needs to be created which contains the x,y points for each site

dune.nmds.data <- dune.env
dune.nmds.data$NMDS1 <- meta.mds.dune$points[ ,1]
dune.nmds.data$NMDS2 <- meta.mds.dune$points[ ,2]

A new dataset containing species data also needs to be made. This is not necessary if you don’t want to show the species on the final graph. Alternatively you can calculate intrinsic variables as above and plot them using code similar to the environmental extrinsics below

sp.abund<-colSums(dune) #total abundances for each species
spps <- data.frame(scores(meta.mds.dune, display = "species")) #dataframe of species scoes for plotting
spps$species <- row.names(spps) # making a column with species names
spps$colsums <- sp.abund #adding the colSums from above
spps<-spps[!is.na(spps$NMDS1) & !is.na(spps$NMDS2),] #removes NAs
spps.colmedian <- median(spps$colsums) #create an object that is the median of the abundance of the measured species
spps.colmean <- mean(spps$colsums) #creates a mean instead if you wish to use

spps2 <- subset(spps,spps$colsums > spps.colmean) #select the most abundant species. Could discard fewer by going something like - spps$colsums>(spps.colmedian/2) instead
spps2$species <- factor(spps2$species) #otherwise factor doesn't drop unused levels and it will throw an error

To show environmental extrinsic variables another datasheet needs to be created

env.scores.dune <- as.data.frame(scores(dune.envfit, display = "vectors")) #extracts relevant scores from envifit
env.scores.dune <- cbind(env.scores.dune, env.variables = rownames(env.scores.dune)) #and then gives them their names

Now we have the relevant information for plotting the ordination in ggplot! Lets get plotting!

mult <- 2 #multiplier for the arrows and text for envfit below. You can change this and then rerun the plot command.
dune.nmds.gg1 <- ggplot(data = dune.nmds.data, aes(y = NMDS2, x = NMDS1))+ #sets up the plot. 
   geom_point(aes(shape = Management, col = Management), size = 3 ) + #puts the site points in from the ordination, shape determined by site, size refers to size of point
    geom_segment(data = env.scores.dune,
                    aes(x = 0, xend = mult*NMDS1, y = 0, yend = mult*NMDS2),
                    arrow = arrow(length = unit(0.25, "cm")), colour = "blue") + #arrows for envfit.  doubled the length for similarity to the plot() function. NB check ?envfit regarding arrow length if not familiar with lengths
       geom_text(data = env.scores.dune, #labels the environmental variable arrows * "mult" as for the arrows
                 aes(x = mult*NMDS1, y = mult*NMDS2, label=env.variables),size = 5, hjust = -0.5)+
   scale_shape_manual(values = c(16, 8, 17, 18))+ #sets the shape of the plot points instead of using whatever ggplot2 automatically provides
    scale_color_manual(values = c("green", "blue", "orange", "black"))+
    coord_cartesian(xlim = c(-1,1.5))+  ## NB this changes the visible area of the plot only (this is a good thing, apparently). Can also specify ylim. Here in case you want to set xaxis manually.
   theme_classic()+ # plain graph type with no grid lines and white background
  theme(panel.background = element_rect(fill = NA, 
        colour = "black", linetype = "solid"), # adds border around plot area
    legend.key = element_rect(fill = NA, linetype = "blank")) # removes box and shading around legend symbols

dune.nmds.gg1 #displays plot output

The thing with ggplot is that you can add many layers onto the same plot using different geom_ options as well as manipulating every graphical feature and graph space. Here are some other options which could be added to the ordination plot.

Ellipses

# function for ellipsess 
#taken from the excellent stackoverflow Q+A: http://stackoverflow.com/questions/13794419/plotting-ordiellipse-function-from-vegan-package-onto-nmds-plot-created-in-ggplo
veganCovEllipse <- function (cov, center = c(0, 0), scale = 1, npoints = 100) 
{
  theta <- (0:npoints) * 2 * pi/npoints
  Circle <- cbind(cos(theta), sin(theta))
  t(center + scale * t(Circle %*% chol(cov)))
}
 
#data for ellipse, in this case using the management factor
df_ell.dune.management <- data.frame() #sets up a data frame before running the function.
for(g in levels(dune.nmds.data$Management)){
  df_ell.dune.management <- rbind(df_ell.dune.management, cbind(as.data.frame(with(dune.nmds.data [dune.nmds.data$Management==g,],
                                                         veganCovEllipse(cov.wt(cbind(NMDS1,NMDS2),wt=rep(1/length(NMDS1),length(NMDS1)))$cov,center=c(mean(NMDS1),mean(NMDS2))))) ,Management=g))
}
 
# data for labelling the ellipse
NMDS.mean.dune=aggregate(dune.nmds.data[ ,c("NMDS1", "NMDS2")], 
                         list(group = dune.nmds.data$Management), mean)
 
# data for labelling the ellipse
NMDS.mean=aggregate(dune.nmds.data[,c("NMDS1", "NMDS2")], 
                    list(group = dune.nmds.data$Management), mean)

dune.nmds.gg1+ 
geom_path(data = df_ell.dune.management, aes(x = NMDS1, y = NMDS2, group = Management)) #this is the ellipse, seperate ones by Site.

Extras

For some extra tips and tricks with ggplot have a look at this cheat sheet from RStudio https://rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf

As with anything in R there are many ways to create ggplot ordinations. Here is another example using the ggord package from Marcus Beck which seems pretty stright forward as well and maybe doesn’t require as much fidling with data as this example https://www.r-bloggers.com/reinventing-the-wheel-for-ordination-biplots-with-ggplot2/

Session Info

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: OS X El Capitan 10.11.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] dplyr_0.8.3     ggplot2_3.2.1   vegan_2.5-6     lattice_0.20-38
## [5] permute_0.9-5  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.2       compiler_3.6.1   pillar_1.4.2     tools_3.6.1     
##  [5] digest_0.6.20    evaluate_0.14    tibble_2.1.3     gtable_0.3.0    
##  [9] nlme_3.1-140     mgcv_1.8-28      pkgconfig_2.0.2  rlang_0.4.0     
## [13] Matrix_1.2-17    yaml_2.2.0       parallel_3.6.1   xfun_0.8        
## [17] withr_2.1.2      stringr_1.4.0    cluster_2.1.0    knitr_1.23      
## [21] tidyselect_0.2.5 glue_1.3.1       R6_2.4.0         rmarkdown_1.13  
## [25] purrr_0.3.2      magrittr_1.5     scales_1.0.0     htmltools_0.3.6 
## [29] MASS_7.3-51.4    splines_3.6.1    assertthat_0.2.1 colorspace_1.4-1
## [33] labeling_0.3     stringi_1.4.3    lazyeval_0.2.2   munsell_0.5.0   
## [37] crayon_1.3.4

NMDS Ordination

A tutorial for vegetation community multivariate analysis with the vegan package

Rebekah Grieger

30 September 2019