GWU_multivar.R script## seedbank data
seedbank<-read.csv('https://raw.githubusercontent.com/collnell/GWU-visual/master/seedbank.csv')
datatable(seedbank, rownames=FALSE, extensions = 'Buttons', caption = 'Seedbank dataset', options = list(
dom = 't', pageLength = nrow(seedbank),fillContainer=TRUE,
scrollY = 250, scroller=TRUE
))seedbank<-seedbank%>%mutate(ID =paste(site, area, upland_type, rep, sep='_')) # collapse sample info
# melt to long form
comm.melt<-seedbank%>%
dplyr::select(-site:-rep)%>%
melt(id.vars='ID')
head(comm.melt) # now each row relfects a single species at each site# plot heatmap
ggplot(comm.melt, aes(ID, variable))+
geom_tile(aes(fill = value))+
labs(x='',y='')+
theme(axis.text.x = element_text(angle=90))+
scale_fill_gradient(low='seagreen2',high='blue', name='Count', trans='log10', na.value=NA) # adjust scale to control appearance# community matrix - sites x species
comm<-seedbank%>%
column_to_rownames('ID')%>%
dplyr::select(ABTH:VEPE2)
head(comm) Run an NMDS of our seed bank data in 2 dimensions, using a bray-curtis dissimilarity matrix, with a maximum of 100 random starts
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1356575
## Run 1 stress 0.1373819
## Run 2 stress 0.1408261
## Run 3 stress 0.1382998
## Run 4 stress 0.1406275
## Run 5 stress 0.1356557
## ... New best solution
## ... Procrustes: rmse 0.00329856 max resid 0.01617265
## Run 6 stress 0.1360579
## ... Procrustes: rmse 0.02796674 max resid 0.1319205
## Run 7 stress 0.1379261
## Run 8 stress 0.1348187
## ... New best solution
## ... Procrustes: rmse 0.05671719 max resid 0.154943
## Run 9 stress 0.1350495
## ... Procrustes: rmse 0.06060198 max resid 0.1549786
## Run 10 stress 0.1348086
## ... New best solution
## ... Procrustes: rmse 0.003170627 max resid 0.0174545
## Run 11 stress 0.1445199
## Run 12 stress 0.1370284
## Run 13 stress 0.1350496
## ... Procrustes: rmse 0.06041243 max resid 0.1547248
## Run 14 stress 0.1454306
## Run 15 stress 0.1363694
## Run 16 stress 0.1350499
## ... Procrustes: rmse 0.0604075 max resid 0.1547496
## Run 17 stress 0.135461
## Run 18 stress 0.1396575
## Run 19 stress 0.13752
## Run 20 stress 0.1408394
## *** No convergence -- monoMDS stopping criteria:
## 20: stress ratio > sratmax
In general:
- stress <= 0.2 = Good representation of the data without prospect of misinterpretation
- stress = 0.2 – 0.3 = A little iffy
- stress >= 0.3 = Should be treated with skepticism
Find the missing function, and run the code to convert the dataset.
End goal: You should end up with the sample info + 2 nmds variables
Modify this code chunk:
plotting <- scores(nmds_output) %>%
data.frame() %>%
#function to convert the rownames back into a column named "ID" %>%
separate(ID, into = c("site","area","upland_type","rep"), sep = "_") %>% # separate is from the tidyr package in tidyverse
group_by(site)End goal:
This isn’t really helpful without groups to compare! We need a way of distinguishing communities from different sites
Make a new ggplot, modify color coding the points by site and any other aesthetics
use stat_ellipse() to add ellipses for sites
These are 95% CI ellipses by default. This can be modified with the
level argument in stat_ellipse()
Now we can visualize the differences in community composition between sites. But we haven’t yet looked at any within-site community differences. At each site there are “wetland” “ecotone” and “upland” plots that span the gradient from marsh to upland. Play around with color coding, geom_point symbol types, and ellipses to try and present both between and within site differences using the “site” and “area” categorical variables.
More about tidyr: (https://tidyr.tidyverse.org/)
More about this topic and related statistical tests: (http://www.rpubs.com/collnell/manova)
geom_density2d()