Background: How does Brachypodium rhizosphere microbes differ in the native and invaded range:
Split taxonomy column into multiple variables
# bacteria otu
B16S <- read.csv("data/filtered_table.csv", header = TRUE, stringsAsFactors = FALSE)
# split taxa groupings into new columns
source("Brachy_functions.R")
B16S <- split_taxa(B16S)
UniFrac is a measure of B-diversity that uses phylogenetic information to compare community samples. Use with multivariate methods like principal coordinates (PCoA), NMDS, etc to explain differences among communities. Measured as the amount of evolutionary history that is unique to each group - the fraction of branch length in a phylogenetic tree associated with ONLY the focal sample in pairwise comparison. With sequence data UniFrac values can be invfluencd by the number of sequences/sample and can use sequence jackknifing (Lozupone et al 2011. ISME).
Use to compare phylogenetic lineages between groups, to cluster samples.
# unifrac<-read.table('data/unweighted_unifrac_dm.txt')
# unifrac_wt<-read.table('data/weighted_unifrac_dm.txt')
unifrac <- read.table("data/weighted_unifrac_dm.txt")
Use heatmap to visualize sample dissimilarity based on UniFrac
## Error in eval(lhs, parent, parent): object 'unifrac_wt' not found
## Error in ggplot(data = unifracwt.melt, aes(x = reorder(otu_1, value), : object 'unifracwt.melt' not found
## Error in plot_grid(heat.uni, heat.wt, nrow = 1, ncol = 2): object 'heat.wt' not found
otu.taxa <- read.csv("data/OTU_taxa_id.csv") # otu ids
otu.comm <- read.csv("data/OTU_community_clean.csv")
## the samples in the unifrac dist and community need to be in the same order
## need to order community based on unifraq
# make new df based on unifrac, matched to same order as unifrac match the
# order to colnames(unifrac), rewrite over same name
otu.comm <- otu.comm[match(colnames(unifrac), otu.comm$SampleID), ]
# this says - for the community df, order the rows to match the order of the
# column names in unifrac, that match to SampleID
# reorder mapping data to community data
grps <- map6[match(colnames(unifrac), map6$SampleID), ]
# now that all the data (community, distances, grouping variables) are
# ordered the same, we can use permanova etc
Permanova tests for differences in composition among groups
Reminder - permanova is always based on pairwise distances/dissimilarities.
set.seed(304)
# unifrac distances
str(grps) #look at data types to indetify error source
## 'data.frame': 167 obs. of 10 variables:
## $ SampleID : Factor w/ 248 levels "101","103","109",..: 162 93 88 86 151 164 153 154 99 97 ...
## $ BarcodeSequence : logi NA NA NA NA NA NA ...
## $ LinkerPrimerSequence: logi NA NA NA NA NA NA ...
## $ Year : chr "2016" "2016" "2016" "2016" ...
## $ Type : Factor w/ 5 levels "A","B","C","N",..: 2 2 1 2 2 2 2 2 2 1 ...
## $ Genus : Factor w/ 5 levels "Ave","Bra","Bro",..: 3 2 3 3 3 3 2 2 3 2 ...
## $ Species : Factor w/ 7 levels "BD","BH","BM",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ Site : Factor w/ 14 levels "Alg","Ali","BIER",..: 4 7 7 7 5 4 5 5 7 7 ...
## $ Country : Factor w/ 2 levels "Spain","USA": 2 2 2 2 2 2 2 2 2 2 ...
## $ Description : Factor w/ 4 levels "P11","PN2","SP10",..: 3 3 4 3 3 3 3 3 3 4 ...
# Elevation is integer - change to factor
grps$Type <- as.factor(grps$Type)
grps$Genus <- as.factor(grps$Genus)
# ad.uni<-adonis2(unifrac~Type+Genus, data=grps, permutations=1000,
# strata='Country')
# ad.uni<-adonis2(unifrac~Type+Genus+Country, data=grps, permutations=1000)
# ad.uni
# unifrac distances with month nested in year
# ad.uniNest<-adonis2(unifrac~Genus/Species+Type, data=grps,
# permutations=1000) ad.uniNest
# unifrac distances weighted by relative abundance
# ad.uniwt<-adonis2(unifrac_wt~Genus+Type+Country, data=grps,
# permutations=1000) ad.uniwt
Manipulate code for test of interest
The NMDS often pairs well with the PERMANOVA because both are more robust to properties of community data. To gain proper fit may need to try multiple dissimilarity methods or tranformations.
## Run 0 stress 0.1172938
## Run 1 stress 0.1163358
## ... New best solution
## ... Procrustes: rmse 0.04494324 max resid 0.2454522
## Run 2 stress 0.1421087
## Run 3 stress 0.1349808
## Run 4 stress 0.1479205
## Run 5 stress 0.1320926
## Run 6 stress 0.129599
## Run 7 stress 0.1449312
## Run 8 stress 0.1273764
## Run 9 stress 0.1215013
## Run 10 stress 0.1439875
## Run 11 stress 0.1432548
## Run 12 stress 0.1277929
## Run 13 stress 0.1396788
## Run 14 stress 0.1479858
## Run 15 stress 0.1368881
## Run 16 stress 0.1266009
## Run 17 stress 0.1246037
## Run 18 stress 0.1389173
## Run 19 stress 0.1146894
## ... New best solution
## ... Procrustes: rmse 0.02332448 max resid 0.2396804
## Run 20 stress 0.1299571
## *** No convergence -- monoMDS stopping criteria:
## 20: stress ratio > sratmax
## Error in metaMDS(unifrac_wt, k = 2, autotransform = FALSE): object 'unifrac_wt' not found
Plot ordination
# get NMDS scores for each SampleID scores are returned for each sampleid as
# a row, collapses dissimilarity data into 2 dimensions
nmds.uni <- data.frame(scores(mds.uni))
nmds.uni <- nmds.uni[match(grps$SampleID, rownames(nmds.uni)), ] %>% cbind(grps) # match the rows and bind with the grouping data for plotting
# plotting data for nmds for weighted unifrac
nmds.uniwt <- data.frame(scores(mds.uniwt))
## Error in scores(mds.uniwt): object 'mds.uniwt' not found
nmds.uniwt <- nmds.uniwt[match(grps$SampleID, rownames(nmds.uniwt)), ] %>% cbind(grps) # reorder rows and bind to grouping data
## Error in eval(lhs, parent, parent): object 'nmds.uniwt' not found
ggplot(nmds.uni, aes(NMDS1, NMDS2)) + geom_point(aes(color = Type, shape = Genus))
ggplot(nmds.uni, aes(NMDS1, NMDS2)) + geom_point(aes(color = Site, shape = Description))
```
ggplot(nmds.uni, aes(NMDS1, NMDS2)) + geom_point(aes(color = Genus, shape = Country))
ggplot(nmds.uni, aes(NMDS1, NMDS2)) + geom_point(aes(color = Genus, shape = Country))
```
Manipulate plot code to make figure as desired
# Not working to assign Year as a grouping factor
ggplot(nmds.uni, aes(NMDS1, NMDS2)) + geom_point(aes(size = Type, shape = Genus))
## Error in grid.Call.graphics(C_setviewport, vp, TRUE): non-finite location and/or size for viewport
# converting variables between numeric and factors changes how ggplot will
# plot them if hitting errors
str(nmds.uni$Genus)
## Factor w/ 5 levels "Ave","Bra","Bro",..: 3 2 3 3 3 3 2 2 3 2 ...
nmds.uni <- nmds.uni %>% transform(Type = as.factor(Type))
ggplot(nmds.uni, aes(NMDS1, NMDS2)) + geom_point(aes(size = Type, shape = Site))
## Error in grid.Call.graphics(C_setviewport, vp, TRUE): non-finite location and/or size for viewport
# Not working to assign Year as a grouping factor
ggplot(nmds.uni, aes(NMDS1, NMDS2)) + geom_point(aes(size = Type, color = Description))
## Error in grid.Call.graphics(C_setviewport, vp, TRUE): non-finite location and/or size for viewport
# size for discrete scale is a little funky also there are 2 different
# levels for the month of october due to spelling differences - correction
# added upstream to the grouping dataframe
# Not running
ggplot(nmds.uni, aes(NMDS1, NMDS2)) + geom_point(aes(color = Site, shape = Type),
size = 3) + scale_color_gradient2(low = "turquoise", mid = "yellow", high = "red",
midpoint = 1500)
## Error: Discrete value supplied to continuous scale
# to use scale_color_gradient2, the assigned variable must be continuous
# (as.numeric) a shape scale is most useful for categorical data data type
# can be: numeric, factor, integer, and character. character and factor are
# very similar but are treated slightly differently
str(nmds.uni$Elevation)
## NULL
# to convert to numeric back from factor you have to be careful going
# directly factor to numeric does not work. to preserve the values in a
# factor, convert to character, then numeric
nmds.uni$Elevation <- as.character(nmds.uni$Elevation)
## Error in `$<-.data.frame`(`*tmp*`, Elevation, value = character(0)): replacement has 0 rows, data has 167
nmds.uni$Elevation <- as.numeric(nmds.uni$Elevation)
## Error in `$<-.data.frame`(`*tmp*`, Elevation, value = numeric(0)): replacement has 0 rows, data has 167
ggplot(nmds.uni, aes(NMDS1, NMDS2)) + geom_point(aes(color = Elevation, shape = Site),
size = 3) + scale_color_gradient2(low = "turquoise", mid = "yellow", high = "red",
midpoint = 1500)
## Error in FUN(X[[i]], ...): object 'Elevation' not found
Separates well by elevation/site, within each elevation the months are grouped together for the most part.
ggplot(nmds.uniwt, aes(NMDS1, NMDS2)) + geom_point(aes(color = Year, shape = Month),
size = 4) + scale_color_gradient2(low = "turquoise", high = "red", midpoint = 1500)
## Error in ggplot(nmds.uniwt, aes(NMDS1, NMDS2)): object 'nmds.uniwt' not found
# Not running - you need to change Year to categorical using as.factor()
## edit shape scale - the + is hard to see
## https://ggplot2.tidyverse.org/reference/scale_shape.html
# Show a list of available shapes
df_shapes <- data.frame(shape = 0:25)
ggplot(df_shapes, aes(0, 0, shape = shape)) + geom_point(aes(shape = shape),
size = 5, fill = "red") + scale_shape_identity() + facet_wrap(~shape) +
theme_void()
# shapes in red let you manipulate the color & fill (rather than just color)
# color then corresponds to the shape outline, and fill to the inside let's
# use the fill to indicate the Year, shape for the Month, and color for
# Elevation
# let's use the fill to indicate the Year, shape for the Month
# color = Elevation
ggplot(nmds.uni, aes(NMDS1, NMDS2))+
geom_point(aes(color=Elevation), size=3)+
scale_color_gradient2(low='turquoise', mid='yellow', high='red', midpoint=1500)+
theme_bw()
## Error in FUN(X[[i]], ...): object 'Elevation' not found
# fill = Year
ggplot(nmds.uni, aes(NMDS1, NMDS2))+
geom_point(aes(color=Elevation, fill=Year), size=3)+
scale_color_gradient2(low='turquoise', mid='yellow', high='red', midpoint=1500)+
theme_bw()
## Error in FUN(X[[i]], ...): object 'Elevation' not found
# note - this is the same as before even with fill = Year added because we are using a shape that cannot be filled
# fill = Year, shape = 21
ggplot(nmds.uni, aes(NMDS1, NMDS2))+
geom_point(aes(color=Elevation, fill=Year), shape = 21,color='black', size=3)+
scale_color_gradient2(low='turquoise', mid='yellow', high='red', midpoint=1500)+
theme_bw()
# shape = 21 is the filled circle
# shape = Month
ggplot(nmds.uni, aes(NMDS1, NMDS2))+
geom_point(aes(color=Elevation, fill=Year, shape=Month),color='black', size=3)+
scale_color_gradient2(low='turquoise', mid='yellow', high='red', midpoint=1500)+
scale_shape_manual(values=c(21, 22, 24))+ # the values correspond to the shapes above
theme_bw()
## Error in FUN(X[[i]], ...): object 'Month' not found
# and to change the color of the fill:
ggplot(nmds.uni, aes(NMDS1, NMDS2))+
geom_point(aes(color=Elevation, fill=Year), shape = 21,color='black', size=3)+
scale_color_gradient2(low='turquoise', mid='yellow', high='red', midpoint=1500)+
scale_shape_manual(values=c(21, 22, 24))+ # the values correspond to the shapes above
scale_fill_manual(values=c('black','white'))+
theme_bw()
# lets add the stress to the plot
mds.uni$stress #the stress
## [1] 0.1146894
ggplot(nmds.uni, aes(NMDS1, NMDS2))+
geom_point(aes(color=Elevation, fill=Year, shape=Month),color='black', size=3)+
scale_color_gradient2(low='turquoise', mid='yellow', high='red', midpoint=1500)+
scale_shape_manual(values=c(21, 22, 24))+ # the values correspond to the shapes above
scale_fill_manual(values=c('black','white'))+
theme_bw()+
annotate('text', x=0.3, -0.25, label=paste0('Stress = ',round(mds.uni$stress,2)))
## Error in FUN(X[[i]], ...): object 'Month' not found
NMDS is an unconstrained ordination method to visualize multivariate data in fewer dimensions. Depending on the properties of your data and your questions, different methods of ordination may be appropriate. PCoA (or MDS) is a metric version of NMDS, meaning that PCoA is a Euclidean representation of dissimilarities. So, like NMDS PCoA uses dissimilarities, but it is euclidean rather than rank-order which is used to preserve distances hierarchically. NMDS and distance-based ordinations/tests are often easier to use with community data because 0’s and 1’s, rare species are common and do not configure in euclidean space.
In general, ordination & multivaraite methods are defined by (1) whether or not using dissimilarity, (2) whether or not euclidean, (3) test for differences vs explaining differences …
PCoA, like PCA, returns a set of orthogonal axes with associated eignevalues that measure their importance. PCoA and PCA should produce the same results if the PCA is calculated on a covariance matrix with scaling of 1.
plot the pcoa
Note: Can’t run any analyses that rely on relative abundance data (e.g., simper), Address using absence-presence data
This procedure identifies OTUs as indicator species independently from their abundance in the total data set. ####Steps (from https://www.nature.com/articles/ismej2015238):