Nonmetric Multidimensional Scaling (NMDS) Ordination

NMDS is a technique used to simplify multivariate data into a few important axes to facilitate recognition and interpretation of patterns and differences among groups.

For this tutorial we will be using metaMDS in the vegan package.

Always remember:

After setting the working directory with setwd you will want to pull in all the .csv files that you need to run your ordination Using read.csv(). Name your new data-frames accordingly.

To run an ordination you will need a data-frame consisting of plot by species (or trait) matrix AND a “groups” data-frame which should consist of plots with a coding variable for what group each plot belongs to, this will be used for plotting the ordination.

In the example code below, the “groups” variable has a 1-4 numbering system corresponding to commmunity types on Hog Island (1 = Dune, 2 = Swale) and Metompkin Island (3 = Dune, 4 = Swale)

# Set working directory
setwd("~/Desktop/Ordinations in R/Chapter_1 data/")

# Read in species matrix AND grouping variables
island.spp <- 
  read.csv("island_spp_cov.csv", row.names = 1) # This is our community data
island.spp_groups <- 
  read.csv("trait_spp_hab_groups.csv", row.names = 1) # This is our grouping data

Check-out your dataframes and ensure they are properly constructed!

Converting Absolute Abundance to Relative Abundance

Many multivariate analyses are sensitive to absolute abundance in a sample and can skew results, one solution for this is to take absolute abundance data and convert it to relative abundance estimates. We can do this in the vegan package using the decostand function.

Relative abundance is the percent composition of an organism relative to the total number of organisms in the area. There are a number of different methods to standardize data within decostand. To calculate relative abundance we will use "total".

# Calculating relative abundance and creating new dataframe with relative abundance data
island.spp.rel <-         
  decostand(island.spp, method = "total")

From this point on we will use the realtive abundance transformed dataframe.

Calculating your distance matrix

In order to do any distance-based multivariate analyses you have to calculate a distance matrix. Make sure that you are using the correct distance metric when calculating the matrix because distance-based techniques are sensitive to the distance metric that is chosen.

We will use the vegdist function to calculate our distance matrix. Since we are using abundance data we want to use the "bray" distance metric.

# Calculate distance matrix
island.spp_distmat <- 
  vegdist(island.spp.rel, method = "bray")

It will be important now to create a distance matrix that is easier to view and save it to your file as a .csv. Use the as.matrix function to write an easy to view distance matrix and the write.csv to write a new .csv file of your distance matrix.

# Creating easy to view matrix and writing .csv
island.spp_distmat <- 
  as.matrix(island.spp_distmat, labels = T)
write.csv(island.spp_distmat, "island_spp_distmat.csv")

Our newly calculated distance matrix will be an important piece for the remainder of our NMDS

Running NMDS using metaMDS

Now we are ready to run the NMDS!

We will do so using the metaMDS function. When running an NMDS you will have to identify your distance matrix (island.spp_distmat), your distance metric which should match the distance metric from your distance matrix (here we use "bray"), your selected number of dimensions ("k"), your max number of iterations (usually **"maxit = 999"**), and the maximum number of random starts (usually "trymax = 250"). You may need to do a couple of runs with different trymax values, especially if you are working with community data with a lot pf 0’s. Finally, wascores is a method of calculating species scores, default is TRUE. Check the metaMDS help file for other options to further customize your ordination if necessary.

# Running NMDS in vegan (metaMDS)
island.spp_NMS <-
  metaMDS(island.spp_distmat,
          distance = "bray",
          k = 3,
          maxit = 999, 
          trymax = 500,
          wascores = TRUE)

When you run the above code you will get a full result of your NMDS. The NMDS will run to a minimized stress value.

It is common for NMDS analyses to start by running with 2-dimensions (k), but you want to increase the number of dimensions to ensure a minimized stress value. Keep in mind that anything more than 5-dimensions makes it difficult to interpret a 2-dimensional plot.

As a rule of thumb literature has identified the following cut-off values for stress-level:

  • Higher than 0.2 is poor (risks for false interpretation).
  • 0.1 - 0.2 is fair (some distances can be misleading for interpretation).
  • 0.05 - 0.1 is good (can be confident in inferences from plot).
  • Less than 0.05 is excellent (this can be rare).

Running Goodness of Fit and Plotting Shepards diagram

One other way to check how well the ordination plots represent real data is by using the goodness function. You can produce goodness of fit statistics for each observation (points). You can also use the function stessplot to create a Shepard diagram displaying two correlation-like statistics for goodness of fit between ordination distances and observed dissimilarity. This shows how closely our ordination fits real world plot dissimilarities and how well we can interpret the ordination

# Shepards test/goodness of fit
goodness(island.spp_NMS) # Produces a results of test statistics for goodness of fit for each point

stressplot(island.spp_NMS) # Produces a Shepards diagram

This is an example of a Shepard diagram with correlation statistics indicating the fit between ordiantion distances and observed dissimilarities

You should proceed to steps of plotting your NMDS if you identify a minimized stress solution.

Steps to plotting your NMDS

We will start with some basics of plotting, then we will present more advanced plotting techniques.

To plot the NMDS we will use our NMDS analysis (island.spp_NMS) and the plot function. The default character and color symbol for points on the plot is open circles with black outlines. We will also use orditorp function to label points based on "sites". It is important that we identify that we want to plot "sites" NOT "species". You will likely get an error trying to run a plot with "species" because distance based plots do not have species information associated with them unless we identify a community data. You can manually add species scores after, we will go over this code next.

# Plotting points in ordination space
plot(island.spp_NMS, "sites")   # Produces distance 
orditorp(island.spp_NMS, "sites")   # Gives points labels

With **metaMDS** the ordination will go through a PCA rotation which will result in a plot that exhibits the highest amount of variation in axes 1 and 2. This aids in interpretation of the NMDS plot

We will worry about plotting species scores at the end of this tutorial because we are more interested in clusters by our community type (remember this is our “groups” data-frame).

Altering color coding and symbolizing points

First thing that we want to do is identify the color and character symbols that we want to represent our different community types in our NMDS ordination. We can do this by assigning colors for each our community vectors using colvec (you can selected any color). Next, we can assign symbol characters for each community vectors using pchvec.

colvec <- c("gray0", "gray0", "gray49", "gray49")   # Identifies colors for group assignments
pchvec <- c(21, 21, 22, 22)   # Identifies character symbols for group assignments

The color and character symbols used in this code is to identify observations (points) that belong to Hog Island and oberservations (points) that belong to Metompkin Island

We will plot the figure just like we did before, using the plot function. However, this time we will want to create the plat with our grouping variable and newly defined colvec and pchvec values, coding for point color and shape, respectively. We can also add convex hulls (using ordihull) and centroids to the plot for each groups with colors specified by colvec to match the colors that code for our island observations.

plot(island.spp_NMS)
with(island.spp_groups,
     points(island.spp_NMS,
            display = "sites",
            col = "black",
            pch = pchvec[habitat],
            bg = colvec[habitat]))

#Create convex hulls that highlight point clusters based on grouping dataframe
ordihull(
  island.spp_NMS,
  island.spp_groups$habitat,
  display = "sites",
  draw = c("polygon"),
  col = NULL,
  border = c("gray0", "gray0", "gray48", "gray48"),
  lty = c(1, 2, 1, 2),
  lwd = 2.5
  )

# Calculating centroids 

# You can calculate centroids for your groups which can be viewed as the average position of observations in ordination space. 

# Calculating and plotting centroids of NMDS Result
scrs <-
  scores(island.spp_NMS, display = "sites", "species")
cent <-
  aggregate(scrs ~ habitat, data = island.spp_groups, FUN = "mean")
names(cent) [-1] <- colnames(scrs)
points(cent [,-1],
       pch = c( 8 , 8 , 8, 8),
       col = c("gray0", "gray0", "gray48", "gray48"),
       bg = c("black"),
       lwd = 3.0,
       cex = 2.0 # Plots centroids as points on ordination
       )

Calculating and plotting species scores

If you are interested in showing species scores we have the ability to do that because we have community data available.

This code:

island.spp_scrs <- 
  sppscores(island.spp_NMS) <- island.spp.rel

should be used if you are interested in identifying species scores on your distance-based plot. This uses the sppscores function to identify species scores from your "island.spp.rel" data-frame to plot in your ordination.

After calculating your species scores you can add it to your NMDS plot using the plot function by telling the plot to display "species".

# Plotting points in ordination space
plot(island.spp_NMS, "sites")   # Produces distance based bi-plot
plot(island.spp_NMS, "species")   # Plots species scores
orditorp(island.spp_NMS, "sites")   # Gives points labels

It is also important to note that species scores result can be formulated into a table to show point correlations with species scores. This can help to understand why some observations (points) are more similar in species compositions than others. Correlations with each axis can also be investigated by calculating Pearson correlation coefficients, which we will cover after plotting.

Calculating Pearson correlation coefficients for dimensions of NMDS plot

Here we use the cor function to create a new data-frame (island.spp_cor) that gives correlations of each species with the 3 axes we used to calculate the NMDS. We need to identify x which will be our realtiveized matrix of plots and species (island.spp.rel). We also need to identify y this is the data we will be correlating with the species matrix, here we want to use the distance-based points in our NMDS. You can select for this data with the line - island.spp_NMS$points. Next, use is an optional character string giving a method for computing co-variances in the presence of missing values. In this example we will code for complete.obs which will use case-wise deletions of missing values. The last element of this function is method selection, here we will use pearson to get Pearson correlation coefficients. Last we can write a .csv file of the correlation coefficients using the write.csv function.

island.spp_cor <-
  cor(island.spp.rel,
      island.spp_NMS$points,
      use = "complete.obs",
      method = "pearson")
write.csv(island.spp_cor, file = "island_spp_PearsonCor.csv")

Now you are ready to try with your own data!

# Load necessary packages
library(vegan)

# Set working directory
setwd("~/Desktop/Ordinations in R/Chapter_1 data/")

# Read in your data
island.spp <- 
  read.csv("island_spp_cov.csv", row.names = 1)
island.spp.rel <-         #transforming to relative abundance
  decostand(island.spp, method = "total")
island.spp_groups <- 
  read.csv("trait_spp_hab_groups.csv", row.names = 1)

# Calculate distance matrix - make distance matrix data frame - write .csv
island.spp_distmat <- 
  vegdist(island.spp.rel, method = "bray")
island.spp_DistanceMatrix <- 
  as.matrix(island.spp_distmat, labels = T)
write.csv(island.spp_DistanceMatrix, "island_spp_distmat.csv")

# Running NMDS in vegan (metaMDS)
island.spp_NMS <-
  metaMDS(island.spp_distmat,
          distance = "bray",
          k = 3,
          maxit = 999, 
          trymax = 250)
island.spp_scrs <-
  `sppscores<-`(island.spp_NMS, island.spp)
island.spp_cor <-
  cor(island.spp,
      island.spp_NMS$points,
      use = "complete.obs",
      method = "pearson")
write.csv(island.spp_cor, file = "island_spp_PearsonCor.csv")

# Shepards test/goodness of fit
goodness(island.spp_NMS)
stressplot(island.spp_NMS)
# Plot Island trait ordintion w/o Hog marsh - No Delta N15
plot(island.spp_NMS)
orditorp(island.spp_NMS, "sites")   # Gives plot labels
with(island.spp_groups, levels(habitat))   # Sites according to Island
colvec <- c("gray0", "gray0", "gray49", "gray49")
pchvec <- c(21, 21, 22, 22)
plot(island.spp_NMS)
with(island.spp_groups,
     points(island.spp_NMS,
            display = "sites",
            col = "black",
            pch = pchvec[habitat],
            bg = colvec[habitat]))
ordihull(
  island.spp_NMS,
  island.spp_groups$habitat,
  display = "sites",
  draw = c("polygon"),
  col = NULL,
  border = c("gray0", "gray0", "gray48", "gray48"),
  lty = c(1, 2, 1, 2),
  lwd = 2.5
  )
# Calculating and plotting centroids of NMDS Result
scrs <-
  scores(island.spp_NMS, display = "sites", "species")
cent <-
  aggregate(scrs ~ habitat, data = island.spp_groups, FUN = "mean")
names(cent) [-1] <- colnames(scrs)
points(cent [,-1],
       pch = c( 8 , 8 , 8, 8),
       col = c("gray0", "gray0", "gray48", "gray48"),
       bg = c("black"),
       lwd = 3.0,
       cex = 2.0 # Plots centroids as points on ordination
       )