NMDS is a technique used to simplify multivariate data into a few important axes to facilitate recognition and interpretation of patterns and differences among groups.
For this tutorial we will be using metaMDS
in the vegan
package.
Always remember:
vegan
.After setting the working directory with setwd
you will want to pull in all the .csv files that you need to run your ordination Using read.csv()
. Name your new data-frames accordingly.
To run an ordination you will need a data-frame consisting of plot by species (or trait) matrix AND a “groups” data-frame which should consist of plots with a coding variable for what group each plot belongs to, this will be used for plotting the ordination.
In the example code below, the “groups” variable has a 1-4 numbering system corresponding to commmunity types on Hog Island (1 = Dune, 2 = Swale) and Metompkin Island (3 = Dune, 4 = Swale)
# Set working directory
setwd("~/Desktop/Ordinations in R/Chapter_1 data/")
# Read in species matrix AND grouping variables
island.spp <-
read.csv("island_spp_cov.csv", row.names = 1) # This is our community data
island.spp_groups <-
read.csv("trait_spp_hab_groups.csv", row.names = 1) # This is our grouping data
Check-out your dataframes and ensure they are properly constructed!
Many multivariate analyses are sensitive to absolute abundance in a sample and can skew results, one solution for this is to take absolute abundance data and convert it to relative abundance estimates. We can do this in the vegan
package using the decostand
function.
Relative abundance is the percent composition of an organism relative to the total number of organisms in the area. There are a number of different methods to standardize data within decostand
. To calculate relative abundance we will use "total"
.
# Calculating relative abundance and creating new dataframe with relative abundance data
island.spp.rel <-
decostand(island.spp, method = "total")
From this point on we will use the realtive abundance transformed dataframe.
In order to do any distance-based multivariate analyses you have to calculate a distance matrix. Make sure that you are using the correct distance metric when calculating the matrix because distance-based techniques are sensitive to the distance metric that is chosen.
We will use the vegdist
function to calculate our distance matrix. Since we are using abundance data we want to use the "bray"
distance metric.
# Calculate distance matrix
island.spp_distmat <-
vegdist(island.spp.rel, method = "bray")
It will be important now to create a distance matrix that is easier to view and save it to your file as a .csv. Use the as.matrix
function to write an easy to view distance matrix and the write.csv
to write a new .csv file of your distance matrix.
# Creating easy to view matrix and writing .csv
island.spp_distmat <-
as.matrix(island.spp_distmat, labels = T)
write.csv(island.spp_distmat, "island_spp_distmat.csv")
Our newly calculated distance matrix will be an important piece for the remainder of our NMDS
metaMDS
Now we are ready to run the NMDS!
We will do so using the metaMDS
function. When running an NMDS you will have to identify your distance matrix (island.spp_distmat
), your distance metric which should match the distance metric from your distance matrix (here we use "bray"
), your selected number of dimensions ("k"
), your max number of iterations (usually **"maxit = 999"**
), and the maximum number of random starts (usually "trymax = 250"
). You may need to do a couple of runs with different trymax
values, especially if you are working with community data with a lot pf 0’s. Finally, wascores
is a method of calculating species scores, default is TRUE. Check the metaMDS
help file for other options to further customize your ordination if necessary.
# Running NMDS in vegan (metaMDS)
island.spp_NMS <-
metaMDS(island.spp_distmat,
distance = "bray",
k = 3,
maxit = 999,
trymax = 500,
wascores = TRUE)
When you run the above code you will get a full result of your NMDS. The NMDS will run to a minimized stress value.
It is common for NMDS analyses to start by running with 2-dimensions (k
), but you want to increase the number of dimensions to ensure a minimized stress value. Keep in mind that anything more than 5-dimensions makes it difficult to interpret a 2-dimensional plot.
As a rule of thumb literature has identified the following cut-off values for stress-level:
One other way to check how well the ordination plots represent real data is by using the goodness
function. You can produce goodness of fit statistics for each observation (points). You can also use the function stessplot
to create a Shepard diagram displaying two correlation-like statistics for goodness of fit between ordination distances and observed dissimilarity. This shows how closely our ordination fits real world plot dissimilarities and how well we can interpret the ordination
# Shepards test/goodness of fit
goodness(island.spp_NMS) # Produces a results of test statistics for goodness of fit for each point
stressplot(island.spp_NMS) # Produces a Shepards diagram
This is an example of a Shepard diagram with correlation statistics indicating the fit between ordiantion distances and observed dissimilarities
You should proceed to steps of plotting your NMDS if you identify a minimized stress solution.
We will start with some basics of plotting, then we will present more advanced plotting techniques.
To plot the NMDS we will use our NMDS analysis (island.spp_NMS) and the plot
function. The default character and color symbol for points on the plot
is open circles with black outlines. We will also use orditorp
function to label points based on "sites"
. It is important that we identify that we want to plot "sites"
NOT "species"
. You will likely get an error trying to run a plot with "species"
because distance based plots do not have species information associated with them unless we identify a community data. You can manually add species scores after, we will go over this code next.
# Plotting points in ordination space
plot(island.spp_NMS, "sites") # Produces distance
orditorp(island.spp_NMS, "sites") # Gives points labels
With **metaMDS**
the ordination will go through a PCA rotation which will result in a plot that exhibits the highest amount of variation in axes 1 and 2. This aids in interpretation of the NMDS plot
We will worry about plotting species scores at the end of this tutorial because we are more interested in clusters by our community type (remember this is our “groups” data-frame).
First thing that we want to do is identify the color and character symbols that we want to represent our different community types in our NMDS ordination. We can do this by assigning colors for each our community vectors using colvec
(you can selected any color). Next, we can assign symbol characters for each community vectors using pchvec
.
colvec <- c("gray0", "gray0", "gray49", "gray49") # Identifies colors for group assignments
pchvec <- c(21, 21, 22, 22) # Identifies character symbols for group assignments
The color and character symbols used in this code is to identify observations (points) that belong to Hog Island and oberservations (points) that belong to Metompkin Island
We will plot the figure just like we did before, using the plot
function. However, this time we will want to create the plat with our grouping variable and newly defined colvec
and pchvec
values, coding for point color and shape, respectively. We can also add convex hulls (using ordihull
) and centroids to the plot for each groups with colors specified by colvec
to match the colors that code for our island observations.
plot(island.spp_NMS)
with(island.spp_groups,
points(island.spp_NMS,
display = "sites",
col = "black",
pch = pchvec[habitat],
bg = colvec[habitat]))
#Create convex hulls that highlight point clusters based on grouping dataframe
ordihull(
island.spp_NMS,
island.spp_groups$habitat,
display = "sites",
draw = c("polygon"),
col = NULL,
border = c("gray0", "gray0", "gray48", "gray48"),
lty = c(1, 2, 1, 2),
lwd = 2.5
)
# Calculating centroids
# You can calculate centroids for your groups which can be viewed as the average position of observations in ordination space.
# Calculating and plotting centroids of NMDS Result
scrs <-
scores(island.spp_NMS, display = "sites", "species")
cent <-
aggregate(scrs ~ habitat, data = island.spp_groups, FUN = "mean")
names(cent) [-1] <- colnames(scrs)
points(cent [,-1],
pch = c( 8 , 8 , 8, 8),
col = c("gray0", "gray0", "gray48", "gray48"),
bg = c("black"),
lwd = 3.0,
cex = 2.0 # Plots centroids as points on ordination
)
If you are interested in showing species scores we have the ability to do that because we have community data available.
This code:
island.spp_scrs <-
sppscores(island.spp_NMS) <- island.spp.rel
should be used if you are interested in identifying species scores on your distance-based plot. This uses the sppscores
function to identify species scores from your "island.spp.rel"
data-frame to plot in your ordination.
After calculating your species scores you can add it to your NMDS plot using the plot
function by telling the plot to display "species"
.
# Plotting points in ordination space
plot(island.spp_NMS, "sites") # Produces distance based bi-plot
plot(island.spp_NMS, "species") # Plots species scores
orditorp(island.spp_NMS, "sites") # Gives points labels
It is also important to note that species scores result can be formulated into a table to show point correlations with species scores. This can help to understand why some observations (points) are more similar in species compositions than others. Correlations with each axis can also be investigated by calculating Pearson correlation coefficients, which we will cover after plotting.
Here we use the cor
function to create a new data-frame (island.spp_cor
) that gives correlations of each species with the 3 axes we used to calculate the NMDS. We need to identify x
which will be our realtiveized matrix of plots and species (island.spp.rel
). We also need to identify y
this is the data we will be correlating with the species matrix, here we want to use the distance-based points in our NMDS. You can select for this data with the line - island.spp_NMS$points
. Next, use
is an optional character string giving a method for computing co-variances in the presence of missing values. In this example we will code for complete.obs
which will use case-wise deletions of missing values. The last element of this function is method
selection, here we will use pearson
to get Pearson correlation coefficients. Last we can write a .csv file of the correlation coefficients using the write.csv
function.
island.spp_cor <-
cor(island.spp.rel,
island.spp_NMS$points,
use = "complete.obs",
method = "pearson")
write.csv(island.spp_cor, file = "island_spp_PearsonCor.csv")
# Load necessary packages
library(vegan)
# Set working directory
setwd("~/Desktop/Ordinations in R/Chapter_1 data/")
# Read in your data
island.spp <-
read.csv("island_spp_cov.csv", row.names = 1)
island.spp.rel <- #transforming to relative abundance
decostand(island.spp, method = "total")
island.spp_groups <-
read.csv("trait_spp_hab_groups.csv", row.names = 1)
# Calculate distance matrix - make distance matrix data frame - write .csv
island.spp_distmat <-
vegdist(island.spp.rel, method = "bray")
island.spp_DistanceMatrix <-
as.matrix(island.spp_distmat, labels = T)
write.csv(island.spp_DistanceMatrix, "island_spp_distmat.csv")
# Running NMDS in vegan (metaMDS)
island.spp_NMS <-
metaMDS(island.spp_distmat,
distance = "bray",
k = 3,
maxit = 999,
trymax = 250)
island.spp_scrs <-
`sppscores<-`(island.spp_NMS, island.spp)
island.spp_cor <-
cor(island.spp,
island.spp_NMS$points,
use = "complete.obs",
method = "pearson")
write.csv(island.spp_cor, file = "island_spp_PearsonCor.csv")
# Shepards test/goodness of fit
goodness(island.spp_NMS)
stressplot(island.spp_NMS)
# Plot Island trait ordintion w/o Hog marsh - No Delta N15
plot(island.spp_NMS)
orditorp(island.spp_NMS, "sites") # Gives plot labels
with(island.spp_groups, levels(habitat)) # Sites according to Island
colvec <- c("gray0", "gray0", "gray49", "gray49")
pchvec <- c(21, 21, 22, 22)
plot(island.spp_NMS)
with(island.spp_groups,
points(island.spp_NMS,
display = "sites",
col = "black",
pch = pchvec[habitat],
bg = colvec[habitat]))
ordihull(
island.spp_NMS,
island.spp_groups$habitat,
display = "sites",
draw = c("polygon"),
col = NULL,
border = c("gray0", "gray0", "gray48", "gray48"),
lty = c(1, 2, 1, 2),
lwd = 2.5
)
# Calculating and plotting centroids of NMDS Result
scrs <-
scores(island.spp_NMS, display = "sites", "species")
cent <-
aggregate(scrs ~ habitat, data = island.spp_groups, FUN = "mean")
names(cent) [-1] <- colnames(scrs)
points(cent [,-1],
pch = c( 8 , 8 , 8, 8),
col = c("gray0", "gray0", "gray48", "gray48"),
bg = c("black"),
lwd = 3.0,
cex = 2.0 # Plots centroids as points on ordination
)