NMDS ordination in R

Open RStudio

go to www.github.com/collnell/GWU-visual to access workshop materials
download the GWU_multivar.R script
open it in RStudio
make sure you have the following packages installed and loaded:

# data wrangling & plotting
library(tidyverse) # dplyr and ggplot2
library(reshape2)

# ordination
library(vegan)

# spatial analyses - for later
library(sf)
library(raster)
library(USAboundaries)

# color scales
library(viridis)


theme_set(theme_classic())

read in data

## seedbank data
seedbank<-read.csv('https://raw.githubusercontent.com/collnell/GWU-visual/master/seedbank.csv')

datatable(seedbank, rownames=FALSE, extensions = 'Buttons', caption = 'Seedbank dataset', options = list(
  dom = 't', pageLength = nrow(seedbank),fillContainer=TRUE,
  scrollY = 250, scroller=TRUE
))

Multivariate data

visualize with heatmap

seedbank<-seedbank%>%mutate(ID =paste(site, area, upland_type, rep, sep='_')) # collapse sample info

# melt to long form
comm.melt<-seedbank%>%
  dplyr::select(-site:-rep)%>%
  melt(id.vars='ID')
head(comm.melt) # now each row relfects a single species at each site

# plot heatmap
ggplot(comm.melt, aes(ID, variable))+
  geom_tile(aes(fill = value))+
  labs(x='',y='')+
  theme(axis.text.x = element_text(angle=90))+
  scale_fill_gradient(low='seagreen2',high='blue', name='Count', trans='log10', na.value=NA) # adjust scale to control appearance

Ordination

Reduces multivariate data into a few variables that represent the whole dataset
Great for visualizing patterns in your data!
Different ordinations (PCA, DCA, NMDS etc.) use diff. methods of condensing variables
NOT a statistical analysis -> you can run statistics on the outputs of ordinations, but neither the process of reducing the data nor the graph that is produced represent a statistically significant takeaway unless you run accompanying tests

1.1 organize data for NMDS

# community matrix - sites x species
comm<-seedbank%>%
  column_to_rownames('ID')%>%
  dplyr::select(ABTH:VEPE2)
head(comm)

Multidimensional scaling (MDS)

Ordination method that matches dissimilarities measured in samples with the actual distances between points in the resulting ordination
The measure of the mismatch is called “stress”
Non-metric multidimensional scaling, (NMDS most commonly used method for ecological data) is “non-metric” because stress is calculated using ranks, not actual values

NMDS workflow

1.2 look at metaMDS documentation

?metaMDS

Exercise 1:

Run an NMDS of our seed bank data in 2 dimensions, using a bray-curtis dissimilarity matrix, with a maximum of 100 random starts

# template: 
nmds_output <- metaMDS( )

## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1356575 
## Run 1 stress 0.1373819 
## Run 2 stress 0.1408261 
## Run 3 stress 0.1382998 
## Run 4 stress 0.1406275 
## Run 5 stress 0.1356557 
## ... New best solution
## ... Procrustes: rmse 0.00329856  max resid 0.01617265 
## Run 6 stress 0.1360579 
## ... Procrustes: rmse 0.02796674  max resid 0.1319205 
## Run 7 stress 0.1379261 
## Run 8 stress 0.1348187 
## ... New best solution
## ... Procrustes: rmse 0.05671719  max resid 0.154943 
## Run 9 stress 0.1350495 
## ... Procrustes: rmse 0.06060198  max resid 0.1549786 
## Run 10 stress 0.1348086 
## ... New best solution
## ... Procrustes: rmse 0.003170627  max resid 0.0174545 
## Run 11 stress 0.1445199 
## Run 12 stress 0.1370284 
## Run 13 stress 0.1350496 
## ... Procrustes: rmse 0.06041243  max resid 0.1547248 
## Run 14 stress 0.1454306 
## Run 15 stress 0.1363694 
## Run 16 stress 0.1350499 
## ... Procrustes: rmse 0.0604075  max resid 0.1547496 
## Run 17 stress 0.135461 
## Run 18 stress 0.1396575 
## Run 19 stress 0.13752 
## Run 20 stress 0.1408394 
## *** No convergence -- monoMDS stopping criteria:
##     20: stress ratio > sratmax

Stress

A non-parametric regression of DISSIMILARITY on DISTANCE gives you the goodness of fit
Should decrease with increasing dimensions
Kruskal (1964) looks for the “elbow” when stress values drop dramatically

In general:
- stress <= 0.2 = Good representation of the data without prospect of misinterpretation
- stress = 0.2 – 0.3 = A little iffy
- stress >= 0.3 = Should be treated with skepticism

# visualize MDS stress
stressplot(nmds_output)

1.3 convert output to a data frame & add our sample info.

Find the missing function, and run the code to convert the dataset.
End goal: You should end up with the sample info + 2 nmds variables

Modify this code chunk:

plotting <- scores(nmds_output) %>%
  data.frame() %>%
  #function to convert the rownames back into a column named "ID" %>%
  separate(ID, into = c("site","area","upland_type","rep"), sep = "_") %>% # separate is from the tidyr package in tidyverse 
  group_by(site)

End goal:

Exercise 2: plotting NMDS with sample points & categorical variables

2.1 plotting points- plot your ordination output!

r <- ggplot(plotting, aes(x = NMDS1, y = NMDS2))  +
  geom_point(size=3) 

r

This isn’t really helpful without groups to compare! We need a way of distinguishing communities from different sites

2.2 Plot design

Make a new ggplot, modify color coding the points by site and any other aesthetics

2.3 adding ellipses

use stat_ellipse() to add ellipses for sites

These are 95% CI ellipses by default. This can be modified with the level argument in stat_ellipse()

2.4 Play!

Now we can visualize the differences in community composition between sites. But we haven’t yet looked at any within-site community differences. At each site there are “wetland” “ecotone” and “upland” plots that span the gradient from marsh to upland. Play around with color coding, geom_point symbol types, and ellipses to try and present both between and within site differences using the “site” and “area” categorical variables.

More about tidyr: (https://tidyr.tidyverse.org/)
More about this topic and related statistical tests: (http://www.rpubs.com/collnell/manova)

density contours

geom_density2d()