Open RStudio

  1. go to www.github.com/collnell/GWU-visual to access workshop materials
  2. download the GWU_multivar.R script
  3. open it in RStudio
  4. make sure you have the following packages installed and loaded:

read in data

Multivariate data

visualize with heatmap


Ordination

1.1 organize data for NMDS

Multidimensional scaling (MDS)

  • Ordination method that matches dissimilarities measured in samples with the actual distances between points in the resulting ordination
  • The measure of the mismatch is called “stress”
  • Non-metric multidimensional scaling, (NMDS most commonly used method for ecological data) is “non-metric” because stress is calculated using ranks, not actual values

NMDS workflow

1.2 look at metaMDS documentation

Exercise 1:

Run an NMDS of our seed bank data in 2 dimensions, using a bray-curtis dissimilarity matrix, with a maximum of 100 random starts

## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1356575 
## Run 1 stress 0.1373819 
## Run 2 stress 0.1408261 
## Run 3 stress 0.1382998 
## Run 4 stress 0.1406275 
## Run 5 stress 0.1356557 
## ... New best solution
## ... Procrustes: rmse 0.00329856  max resid 0.01617265 
## Run 6 stress 0.1360579 
## ... Procrustes: rmse 0.02796674  max resid 0.1319205 
## Run 7 stress 0.1379261 
## Run 8 stress 0.1348187 
## ... New best solution
## ... Procrustes: rmse 0.05671719  max resid 0.154943 
## Run 9 stress 0.1350495 
## ... Procrustes: rmse 0.06060198  max resid 0.1549786 
## Run 10 stress 0.1348086 
## ... New best solution
## ... Procrustes: rmse 0.003170627  max resid 0.0174545 
## Run 11 stress 0.1445199 
## Run 12 stress 0.1370284 
## Run 13 stress 0.1350496 
## ... Procrustes: rmse 0.06041243  max resid 0.1547248 
## Run 14 stress 0.1454306 
## Run 15 stress 0.1363694 
## Run 16 stress 0.1350499 
## ... Procrustes: rmse 0.0604075  max resid 0.1547496 
## Run 17 stress 0.135461 
## Run 18 stress 0.1396575 
## Run 19 stress 0.13752 
## Run 20 stress 0.1408394 
## *** No convergence -- monoMDS stopping criteria:
##     20: stress ratio > sratmax

Stress

  • A non-parametric regression of DISSIMILARITY on DISTANCE gives you the goodness of fit
  • Should decrease with increasing dimensions
  • Kruskal (1964) looks for the “elbow” when stress values drop dramatically

In general:
- stress <= 0.2 = Good representation of the data without prospect of misinterpretation
- stress = 0.2 – 0.3 = A little iffy
- stress >= 0.3 = Should be treated with skepticism


1.3 convert output to a data frame & add our sample info.

Find the missing function, and run the code to convert the dataset.
End goal: You should end up with the sample info + 2 nmds variables

Modify this code chunk:

End goal:


Exercise 2: plotting NMDS with sample points & categorical variables

2.1 plotting points- plot your ordination output!

This isn’t really helpful without groups to compare! We need a way of distinguishing communities from different sites


2.2 Plot design

Make a new ggplot, modify color coding the points by site and any other aesthetics


2.3 adding ellipses

use stat_ellipse() to add ellipses for sites

These are 95% CI ellipses by default. This can be modified with the level argument in stat_ellipse()


2.4 Play!

Now we can visualize the differences in community composition between sites. But we haven’t yet looked at any within-site community differences. At each site there are “wetland” “ecotone” and “upland” plots that span the gradient from marsh to upland. Play around with color coding, geom_point symbol types, and ellipses to try and present both between and within site differences using the “site” and “area” categorical variables.

More about tidyr: (https://tidyr.tidyverse.org/)
More about this topic and related statistical tests: (http://www.rpubs.com/collnell/manova)


density contours

geom_density2d()