Error in library(vegan) : there is no package called vegan
it is because the package is not installed. Use install.packages("vegan")
and then run library(vegan)
again.library(vegan) # for vegetation and community analysis
library(dplyr) # for data manipulation
## Warning: package 'dplyr' was built under R version 3.4.4
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
It is ok to see these warnings! R is just telling you that it has read the packages correctly but they were built in different versions of R than the one you are currently using, it will not effect the function of the package!
** The data used for this example is part of the vegan package and does not need to be imported or transformed**
Before you can do any sort of ordination analysis you will need to 1. Import data from csv file 2. Transform your data from long format (multiple lines for each site) into wide format (single row for each site, species as columns)
dune <- read.csv(file = "dune.csv", header = TRUE, row.names = 1) # header = TRUE for data with column names, row.names = 1 for data where the site name is in the first column
dune.env <- read.csv(file = "dune_env.csv", header = T, row.names = 1)
#alternatively for this example you can import using
data(dune)
data("dune.env")
The data used in this example doesn’t need transformation as it is already in wide format but this is how you can do it.
After reading in the raw data, there is many rows with data for the same species but from different samples or sites, thsi needs to be transformed it into wide format. To do that you need to create a matrix that is as long as the number of samples and as wide as the number of species observed.
taxa <- unique(data$Species) #creates list of each unique species
samples <- sort(unique(data$Sample.ID)) #creates list of each unique site or sample
#make empty matrix "data1" ready to fill in
data1 <- matrix(nrow = length(samples), ncol = length(taxa), dimnames = list(samples,taxa))
for(r in 1:nrow(LL.data)){
samp <- LL.data[r, 1]
tax <- LL.data[r, 2]
data1[samp,tax] <- LL.data[r, 3]
} # 1, 2, 3 here relate the the column number in the raw data in which the sample name, species name and data are in
data1[is.na(data1)] <- 0 #convert NA's to 0
Now we have a dataset that is in wide format where each sample is its own row and each species is its own column but we need to save it as a data frame to do anyfurther analysis.
data <- as.data.frame(data1)
Now that the data is in a format that is suitable for ordination methods you can use the metaMDS
function from the vegan package to run the NMDS and envfit
to identify species or environmental variables which are driving the pattern.
dune.mds <- metaMDS(dune, distance = "bray", autotransform = FALSE)
## Run 0 stress 0.1192678
## Run 1 stress 0.1192679
## ... Procrustes: rmse 0.0001229708 max resid 0.0003715247
## ... Similar to previous best
## Run 2 stress 0.1192679
## ... Procrustes: rmse 7.501385e-05 max resid 0.0002322669
## ... Similar to previous best
## Run 3 stress 0.1183186
## ... New best solution
## ... Procrustes: rmse 0.02026874 max resid 0.06494965
## Run 4 stress 0.1183186
## ... Procrustes: rmse 6.995072e-05 max resid 0.0002238919
## ... Similar to previous best
## Run 5 stress 0.2075713
## Run 6 stress 0.1183186
## ... Procrustes: rmse 7.751641e-06 max resid 2.498121e-05
## ... Similar to previous best
## Run 7 stress 0.1192679
## Run 8 stress 0.119268
## Run 9 stress 0.1901493
## Run 10 stress 0.119268
## Run 11 stress 0.1183186
## ... Procrustes: rmse 3.382625e-05 max resid 0.0001106193
## ... Similar to previous best
## Run 12 stress 0.1183186
## ... Procrustes: rmse 2.809714e-05 max resid 8.092168e-05
## ... Similar to previous best
## Run 13 stress 0.1808911
## Run 14 stress 0.1192679
## Run 15 stress 0.1192678
## Run 16 stress 0.1183186
## ... Procrustes: rmse 3.085071e-05 max resid 7.09679e-05
## ... Similar to previous best
## Run 17 stress 0.119268
## Run 18 stress 0.1183186
## ... Procrustes: rmse 5.956215e-05 max resid 0.0001992485
## ... Similar to previous best
## Run 19 stress 0.1183186
## ... New best solution
## ... Procrustes: rmse 8.709955e-06 max resid 2.963027e-05
## ... Similar to previous best
## Run 20 stress 0.1183187
## ... Procrustes: rmse 5.967417e-05 max resid 0.0001818531
## ... Similar to previous best
## *** Solution reached
dune.mds # make note of the stress value, this shows how easy it was to condense multidimensional data into two dimensional space, below 0.2 is generally good
##
## Call:
## metaMDS(comm = dune, distance = "bray", autotransform = FALSE)
##
## global Multidimensional Scaling using monoMDS
##
## Data: dune
## Distance: bray
##
## Dimensions: 2
## Stress: 0.1183186
## Stress type 1, weak ties
## Two convergent solutions found after 20 tries
## Scaling: centring, PC rotation, halfchange scaling
## Species: expanded scores based on 'dune'
You can use base R plotting functions to graph ordinations which are fine and fairly maniplative but you can also ggplotting functions which are a bit more fancy and more manipulative. Here there is code to do it either way
The dune dataset has two levels of categorisation; management option and land use type. I want to display both of these on one plot so I will use different symbols to display management options and different colours to display land use types.
plot(dune.mds) # displays sites and species
plot(dune.mds, type = "n") #displays empty ordination space
points(dune.mds, display = "sites", pch = c(16, 8, 17, 11) [as.numeric(dune.env$Management)], col = c("blue", "orange", "black") [as.numeric(dune.env$Use)]) # displays site points where symbols (pch) are different management options and colour (col) are different land uses
legend("topright", legend = c(levels(dune.env$Management), levels(dune.env$Use)), pch = c(16, 8, 17, 11, 16, 16, 16), col = c("black","black","black","black","blue", "orange", "black"), bty = "n", cex = 1) # displays symbol and colour legend
legend("topleft", "stress = 0.118", bty = "n", cex = 1) # displays legend text of stress value
You can also add ellipses or hulls.
Ordiplot is an alternative to the basic plot function but will do the same thing in this case.
ordiplot(dune.mds, type = "n", main = "ellipses")
orditorp(dune.mds, display = "sites", labels = F, pch = c(16, 8, 17, 18) [as.numeric(dune.env$Management)], col = c("green", "blue", "orange", "black") [as.numeric(dune.env$Management)], cex = 1)
ordiellipse(dune.mds, groups = dune.env$Management, draw = "polygon", lty = 1, col = "grey90")
ordiplot(dune.mds, type = "n", main = "hulls")
orditorp(dune.mds, display = "sites", labels = F, pch = c(16, 8, 17, 18) [as.numeric(dune.env$Management)], col = c("green", "blue", "orange", "black") [as.numeric(dune.env$Management)], cex = 1)
ordihull(dune.mds, groups = dune.env$Management, draw = "polygon", lty = 1, col = "grey90")
You can also investigate the species which may be driving the site distribution pattern, referred to as intrinsic variables.
dune.spp.fit <- envfit(dune.mds, dune, permutations = 999)
head(dune.spp.fit)
## $vectors
## NMDS1 NMDS2 r2 Pr(>r)
## Achimill -0.99991 0.01317 0.4160 0.010 **
## Agrostol 0.84322 -0.53758 0.7109 0.001 ***
## Airaprae -0.23061 0.97305 0.5034 0.005 **
## Alopgeni 0.40720 -0.91334 0.4044 0.010 **
## Anthodor -0.64679 0.76267 0.5430 0.004 **
## Bellpere -0.78420 -0.62051 0.1856 0.174
## Bromhord -0.77617 -0.63052 0.2266 0.105
## Chenalbu 0.40395 -0.91478 0.1018 0.434
## Cirsarve -0.13497 -0.99085 0.0638 0.574
## Comapalu 0.85099 0.52518 0.3082 0.045 *
## Eleopalu 0.99024 0.13937 0.6264 0.002 **
## Elymrepe -0.39898 -0.91696 0.4414 0.008 **
## Empenigr -0.04253 0.99910 0.2335 0.103
## Hyporadi -0.21341 0.97696 0.4810 0.003 **
## Juncarti 0.98403 -0.17802 0.3704 0.025 *
## Juncbufo 0.27714 -0.96083 0.1739 0.200
## Lolipere -0.80008 -0.59989 0.5552 0.001 ***
## Planlanc -0.87634 0.48170 0.3897 0.015 *
## Poaprat -0.70971 -0.70449 0.6187 0.002 **
## Poatriv -0.23105 -0.97294 0.6088 0.001 ***
## Ranuflam 0.99697 0.07781 0.6625 0.002 **
## Rumeacet -0.93995 -0.34133 0.1108 0.359
## Sagiproc 0.40002 -0.91651 0.0453 0.711
## Salirepe 0.42430 0.90552 0.2732 0.068 .
## Scorautu -0.44890 0.89358 0.3288 0.038 *
## Trifprat -0.99608 0.08850 0.1175 0.349
## Trifrepe -0.98731 0.15882 0.0160 0.857
## Vicilath -0.55859 0.82944 0.1131 0.366
## Bracruta 0.55368 0.83273 0.1145 0.373
## Callcusp 0.94671 0.32207 0.5049 0.003 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
##
## $factors
## NULL
##
## $na.action
## function (object, ...)
## UseMethod("na.action")
## <bytecode: 0x0000000006dfdd48>
## <environment: namespace:stats>
ordiplot(dune.mds, type = "n", main = "intrinsic species")
orditorp(dune.mds, display = "sites", labels = F, pch = c(16, 8, 17, 18) [as.numeric(dune.env$Management)], col = c("green", "blue", "orange", "black") [as.numeric(dune.env$Management)], cex = 1)
plot(dune.spp.fit, p.max = 0.01, col = "black", cex = 0.7) # change the significance level of species shown with p.max
Environmental variables can also be used with envfit
which are referred to as extrinsic variables. This works best with continuous variables of which there is only one (A1) in this dataset.If you only want to fit vector variables (continuous variables) use vectorfit
and if you only want to fit factor variables (categorical variables) use factorfit
but envfit can do this automatically.
dune.envfit <- envfit(dune.mds, dune.env, permutations = 999)
head(dune.envfit)
## $vectors
## NMDS1 NMDS2 r2 Pr(>r)
## A1 0.96473 0.26322 0.3649 0.026 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
##
## $factors
## Centroids:
## NMDS1 NMDS2
## Moisture1 -0.5101 -0.0403
## Moisture2 -0.3938 0.0139
## Moisture4 0.2765 -0.4033
## Moisture5 0.6561 0.1476
## ManagementBF -0.4534 -0.0102
## ManagementHF -0.2636 -0.1282
## ManagementNM 0.2958 0.5790
## ManagementSF 0.1506 -0.4670
## UseHayfield -0.1568 0.3248
## UseHaypastu -0.0412 -0.3370
## UsePasture 0.2854 0.0844
## Manure0 0.2958 0.5790
## Manure1 -0.2482 -0.0215
## Manure2 -0.3079 -0.1866
## Manure3 0.3101 -0.2470
## Manure4 -0.3463 -0.5583
##
## Goodness of fit:
## r2 Pr(>r)
## Moisture 0.5014 0.002 **
## Management 0.4134 0.007 **
## Use 0.1871 0.153
## Manure 0.4247 0.017 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
##
## $na.action
## function (object, ...)
## UseMethod("na.action")
## <bytecode: 0x0000000006dfdd48>
## <environment: namespace:stats>
ordiplot(dune.mds, type = "n", main = "extrinsic variables")
orditorp(dune.mds, display = "sites", labels = F, pch = c(16, 8, 17, 18) [as.numeric(dune.env$Management)], col = c("green", "blue", "orange", "black") [as.numeric(dune.env$Management)], cex = 1)
plot(dune.envfit, col = "black", cex = 0.7)
For some more options and info on base R ordination plotting see this tutorial from Jon Lefcheck https://jonlefcheck.net/2012/10/24/nmds-tutorial-in-r/ or some more examples and community ecology analysis tips by David Zelený https://www.davidzeleny.net/anadat-r/doku.php/en:ordiagrams_examples
Although the base plotting functions in R are suitable for ordination plots, ggplot can provide some more functionality and different aesthetics more easily. ggplot does require that data for plotting is all saved as a data frame.
The following example is based off code provided by Christopher Chizinski but has been modified to use the same Dune dataset as above https://www.r-bloggers.com/reinventing-the-wheel-for-ordination-biplots-with-ggplot2/
For this example you need the ggplot2 and vegan packages
library(ggplot2) # for pretty plots
library(vegan)
Import data
data("dune")
data("dune.env")
The mds and envfit functions are run using the same method above
dune.mds <- metaMDS(dune, distance = "bray", autotransform = F)
dune.envfit <- envfit(dune.mds, dune.env, permutations = 999) # this fits environmental vectors
dune.spp.fit <- envfit(dune.mds, dune, permutations = 999) # this fits species vectors
To plot the output from the mds using ggplot a new datasheet needs to be created which contains the x,y points for each site. You can do this by calling the scores of you mds.
site.scrs <- as.data.frame(scores(dune.mds, display = "sites")) #save NMDS results into dataframe
site.scrs <- cbind(site.scrs, Management = dune.env$Management) #add grouping variable "Management" to dataframe
site.scrs <- cbind(site.scrs, Landuse = dune.env$Use) #add grouping variable of cluster grouping to dataframe
#site.scrs <- cbind(site.scrs, Site = rownames(site.scrs)) #add site names as variable if you want to display on plot
head(site.scrs)
## NMDS1 NMDS2 Management Landuse
## 1 -0.84051175 -0.71587030 SF Haypastu
## 2 -0.50485164 -0.40893973 BF Haypastu
## 3 -0.08266112 -0.43668223 SF Haypastu
## 4 -0.11561563 -0.52223983 SF Haypastu
## 5 -0.62655111 -0.08670437 HF Hayfield
## 6 -0.54270296 0.11315055 HF Haypastu
A new dataset containing species data also needs to be made to look at species vectors.
This is not necessary if you don’t want to show the species on the final graph.
spp.scrs <- as.data.frame(scores(dune.spp.fit, display = "vectors")) #save species intrinsic values into dataframe
spp.scrs <- cbind(spp.scrs, Species = rownames(spp.scrs)) #add species names to dataframe
spp.scrs <- cbind(spp.scrs, pval = dune.spp.fit$vectors$pvals) #add pvalues to dataframe so you can select species which are significant
#spp.scrs<- cbind(spp.scrs, abrev = abbreviate(spp.scrs$Species, minlength = 6)) #abbreviate species names
sig.spp.scrs <- subset(spp.scrs, pval<=0.05) #subset data to show species significant at 0.05
head(spp.scrs)
## NMDS1 NMDS2 Species pval
## Achimill -0.6448960 0.008496538 Achimill 0.010
## Agrostol 0.7109566 -0.453256126 Agrostol 0.001
## Airaprae -0.1636148 0.690374853 Airaprae 0.005
## Alopgeni 0.2589360 -0.580794776 Alopgeni 0.010
## Anthodor -0.4766061 0.561991023 Anthodor 0.004
## Bellpere -0.3378215 -0.267308503 Bellpere 0.174
To show environmental extrinsic variables another datasheet needs to be created
env.scores.dune <- as.data.frame(scores(dune.envfit, display = "vectors")) #extracts relevant scores from envifit
env.scores.dune <- cbind(env.scores.dune, env.variables = rownames(env.scores.dune)) #and then gives them their names
env.scores.dune <- cbind(env.scores.dune, pval = dune.envfit$vectors$pvals) # add pvalues to dataframe
sig.env.scrs <- subset(env.scores.dune, pval<=0.05) #subset data to show variables significant at 0.05
head(env.scores.dune)
## NMDS1 NMDS2 env.variables pval
## A1 0.5828059 0.1590172 A1 0.026
Now we have the relevant information for plotting the ordination in ggplot! Lets get plotting!
nmds.plot.dune <- ggplot(site.scrs, aes(x=NMDS1, y=NMDS2))+ #sets up the plot
geom_point(aes(NMDS1, NMDS2, colour = factor(site.scrs$Management), shape = factor(site.scrs$Landuse)), size = 2)+ #adds site points to plot, shape determined by Landuse, colour determined by Management
coord_fixed()+
theme_classic()+
theme(panel.background = element_rect(fill = NA, colour = "black", size = 1, linetype = "solid"))+
labs(colour = "Management", shape = "Landuse")+ # add legend labels for Management and Landuse
theme(legend.position = "right", legend.text = element_text(size = 12), legend.title = element_text(size = 12), axis.text = element_text(size = 10)) # add legend at right of plot
nmds.plot.dune + labs(title = "Basic ordination plot") #displays plot
The thing with ggplot is that you can add many layers onto the same plot using different geom_
options as well as manipulating every graphical feature and graph space.
Here are some other options which could be added to the ordination plot.
Significant Species
nmds.plot.dune+
geom_segment(data = sig.spp.scrs, aes(x = 0, xend=NMDS1, y=0, yend=NMDS2), arrow = arrow(length = unit(0.25, "cm")), colour = "grey10", lwd=0.3) + #add vector arrows of significant species
ggrepel::geom_text_repel(data = sig.spp.scrs, aes(x=NMDS1, y=NMDS2, label = Species), cex = 3, direction = "both", segment.size = 0.25)+ #add labels for species, use ggrepel::geom_text_repel so that labels do not overlap
labs(title = "Ordination with species vectors")
Significant Environmental Variables
nmds.plot.dune+
geom_segment(data = sig.env.scrs, aes(x = 0, xend=NMDS1, y=0, yend=NMDS2), arrow = arrow(length = unit(0.25, "cm")), colour = "grey10", lwd=0.3) + #add vector arrows of significant env variables
ggrepel::geom_text_repel(data = sig.env.scrs, aes(x=NMDS1, y=NMDS2, label = env.variables), cex = 4, direction = "both", segment.size = 0.25)+ #add labels for env variables
labs(title="Ordination with environmental vectors")
Ellipses
taken from the excellent stackoverflow Q+A: http://stackoverflow.com/questions/13794419/plotting-ordiellipse-function-from-vegan-package-onto-nmds-plot-created-in-ggplo
# function for ellipsess
veganCovEllipse <- function (cov, center = c(0, 0), scale = 1, npoints = 100)
{
theta <- (0:npoints) * 2 * pi/npoints
Circle <- cbind(cos(theta), sin(theta))
t(center + scale * t(Circle %*% chol(cov)))
}
#data for ellipse, in this case using the management factor
df_ell.dune.management <- data.frame() #sets up a data frame before running the function.
for(g in levels(site.scrs$Management)){
df_ell.dune.management <- rbind(df_ell.dune.management, cbind(as.data.frame(with(site.scrs [site.scrs$Management==g,],
veganCovEllipse(cov.wt(cbind(NMDS1,NMDS2),wt=rep(1/length(NMDS1),length(NMDS1)))$cov,center=c(mean(NMDS1),mean(NMDS2))))) ,Management=g))
}
# data for labelling the ellipse
NMDS.mean.dune=aggregate(site.scrs[ ,c("NMDS1", "NMDS2")],
list(group = site.scrs$Management), mean)
# data for labelling the ellipse
NMDS.mean=aggregate(site.scrs[,c("NMDS1", "NMDS2")],
list(group = site.scrs$Management), mean)
nmds.plot.dune+
geom_path(data = df_ell.dune.management, aes(x = NMDS1, y = NMDS2, group = Management)) #this is the ellipse, seperate ones by Site.
hulls
See Chris Chizinski’s worked example to see how to create data and add hulls to ordination plot
https://www.r-bloggers.com/reinventing-the-wheel-for-ordination-biplots-with-ggplot2/
Also see Chris’ worked example for ordisurf
to add information about continuous variables to the ordination plot
https://chrischizinski.github.io/rstats/ordisurf/
For some extra tips and tricks with ggplot have a look at this cheat sheet from RStudio https://rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf
As with anything in R there are many ways to create ggplot ordinations.
Here is another example using the ggord
package from Marcus Beck which seems pretty stright forward as well and maybe doesn’t require as much fidling with data as this example https://www.r-bloggers.com/reinventing-the-wheel-for-ordination-biplots-with-ggplot2/
Another example has been taken from Olivia Burge and uses the same Dune dataset as above https://oliviarata.wordpress.com/2014/04/17/ordinations-in-ggplot2/
sessionInfo()
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 14393)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
## [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
## [5] LC_TIME=English_Australia.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dplyr_0.8.0.1 ggplot2_3.2.1 vegan_2.5-4 lattice_0.20-38
## [5] permute_0.9-5
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.1 compiler_3.4.3 pillar_1.4.2 tools_3.4.3
## [5] digest_0.6.18 evaluate_0.14 tibble_2.1.1 gtable_0.3.0
## [9] nlme_3.1-137 mgcv_1.8-28 pkgconfig_2.0.3 rlang_0.3.4
## [13] Matrix_1.2-17 ggrepel_0.8.0 yaml_2.2.0 parallel_3.4.3
## [17] xfun_0.10 withr_2.1.2 stringr_1.4.0 cluster_2.0.8
## [21] knitr_1.25 grid_3.4.3 tidyselect_0.2.5 glue_1.3.1
## [25] R6_2.4.0 rmarkdown_1.16 purrr_0.3.2 magrittr_1.5
## [29] scales_1.0.0 htmltools_0.3.6 MASS_7.3-51.4 splines_3.4.3
## [33] assertthat_0.2.1 colorspace_1.4-1 labeling_0.3 stringi_1.4.3
## [37] lazyeval_0.2.2 munsell_0.5.0 crayon_1.3.4