GWU_multivar.R script# data wrangling & plotting
library(tidyverse) # dplyr and ggplot2
library(reshape2)
# ordination
library(vegan)
install.packages("devtools") # to install packages from github
library(devtools)
install_github("drmarcogir/marcoUtils") # package from developer
library(marcoUtils)
# spatial analyses
library(sf)
library(raster)
library(USAboundaries)
# color scales
library(viridis)
theme_set(theme_classic())## seedbank data
seedbank<-read.csv('https://raw.githubusercontent.com/collnell/GWU-visual/master/seedbank.csv')
datatable(seedbank, rownames=FALSE, extensions = 'Buttons', caption = 'Seedbank dataset', options = list(
dom = 't', pageLength = nrow(seedbank),fillContainer=TRUE,
scrollY = 250, scroller=TRUE
))seedbank<-seedbank%>%mutate(ID =paste(site, area, upland_type, rep, sep='_')) # collapse sample info
# melt to long form
comm.melt<-seedbank%>%
dplyr::select(-site:-rep)%>%
melt(id.vars='ID')
head(comm.melt) # now each row relfects a single species at each site# plot heatmap
ggplot(comm.melt, aes(ID, variable))+
geom_tile(aes(fill = value))+
labs(x='',y='')+
theme(axis.text.x = element_text(angle=90))+
scale_fill_gradient(low='seagreen2',high='blue', name='Count', trans='log10', na.value=NA) # adjust scale to control appearance# community matrix - sites x species
comm<-seedbank%>%
column_to_rownames('ID')%>%
dplyr::select(ABTH:VEPE2)
head(comm) Run an NMDS of our seed bank data in 2 dimensions, using a bray-curtis dissimilarity matrix, with a maximum of 100 random starts
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1356575
## Run 1 stress 0.1352551
## ... New best solution
## ... Procrustes: rmse 0.02525695 max resid 0.1400273
## Run 2 stress 0.1360606
## Run 3 stress 0.1350597
## ... New best solution
## ... Procrustes: rmse 0.01626168 max resid 0.09964186
## Run 4 stress 0.1354766
## ... Procrustes: rmse 0.06173041 max resid 0.155755
## Run 5 stress 0.1442987
## Run 6 stress 0.1369661
## Run 7 stress 0.1375207
## Run 8 stress 0.1442941
## Run 9 stress 0.1376527
## Run 10 stress 0.134809
## ... New best solution
## ... Procrustes: rmse 0.06050594 max resid 0.1535728
## Run 11 stress 0.136061
## Run 12 stress 0.1442899
## Run 13 stress 0.1374694
## Run 14 stress 0.1383137
## Run 15 stress 0.1348228
## ... Procrustes: rmse 0.003895779 max resid 0.01780682
## Run 16 stress 0.1388309
## Run 17 stress 0.1348084
## ... New best solution
## ... Procrustes: rmse 0.0005532594 max resid 0.002429822
## ... Similar to previous best
## Run 18 stress 0.1358494
## Run 19 stress 0.1351768
## ... Procrustes: rmse 0.06149191 max resid 0.154352
## Run 20 stress 0.1474878
## *** Solution reached
In general:
- stress <= 0.2 = Good representation of the data without prospect of misinterpretation
- stress = 0.2 – 0.3 = A little iffy
- stress >= 0.3 = Should be treated with skepticism
Find the missing function, and run the code to convert the dataset.
End goal: You should end up with the sample info + 2 nmds variables
Modify this code chunk:
plotting <- scores(nmds_output) %>%
data.frame() %>%
#function to convert the rownames back into a column named "ID" %>%
separate(ID, into = c("site","area","upland_type","rep"), sep = "_") %>% # separate is from the tidyr package in tidyverse
group_by(site)End goal:
This isn’t really helpful without groups to compare! We need a way of distinguishing communities from different sites
Make a new ggplot, modify color coding the points by site and any other aesthetics
use stat_ellipse() to add ellipses for sites
These are 95% CI ellipses by default. This can be modified with the
level argument in stat_ellipse()
Now we can visualize the differences in community composition between sites. But we haven’t yet looked at any within-site community differences. At each site there are “wetland” “ecotone” and “upland” plots that span the gradient from marsh to upland. Play around with color coding, geom_point symbol types, and ellipses to try and present both between and within site differences using the “site” and “area” categorical variables.
More about tidyr: (https://tidyr.tidyverse.org/)
More about this topic and related statistical tests: (http://www.rpubs.com/collnell/manova)
geom_density2d()