Logic

Background: How does Brachypodium rhizosphere microbes differ in the native and invaded range:

Q1. Do Brachypodium-associated microbial communities differ across space and time, and by whether it originated from soil or roots?

Outputs:

1. Determine the number of new Brachypodium-associated microbial OTUs identified in the invaded and native ranges

2. NMDS of unweighted unifrac or jaccard dissimilarity. PERMANOVA with PERMDISP.

Prepare data

Split taxonomy column into multiple variables

# bacteria otu
B16S <- read.csv("data/filtered_table.csv", header = TRUE, stringsAsFactors = FALSE)

# split taxa groupings into new columns
source("Brachy_functions.R")
B16S <- split_taxa(B16S)

Read in UniFrac distances

UniFrac is a measure of B-diversity that uses phylogenetic information to compare community samples. Use with multivariate methods like principal coordinates (PCoA), NMDS, etc to explain differences among communities. Measured as the amount of evolutionary history that is unique to each group - the fraction of branch length in a phylogenetic tree associated with ONLY the focal sample in pairwise comparison. With sequence data UniFrac values can be invfluencd by the number of sequences/sample and can use sequence jackknifing (Lozupone et al 2011. ISME).
Use to compare phylogenetic lineages between groups, to cluster samples.

# unifrac<-read.table('data/unweighted_unifrac_dm.txt')
# unifrac_wt<-read.table('data/weighted_unifrac_dm.txt')
unifrac <- read.table("data/weighted_unifrac_dm.txt")

Use heatmap to visualize sample dissimilarity based on UniFrac

## Error in eval(lhs, parent, parent): object 'unifrac_wt' not found
## Error in ggplot(data = unifracwt.melt, aes(x = reorder(otu_1, value), : object 'unifracwt.melt' not found
## Error in plot_grid(heat.uni, heat.wt, nrow = 1, ncol = 2): object 'heat.wt' not found

Prep data for multivariate anaylsis

otu.taxa <- read.csv("data/OTU_taxa_id.csv")  # otu ids
otu.comm <- read.csv("data/OTU_community_clean.csv")
## the samples in the unifrac dist and community need to be in the same order
## need to order community based on unifraq

# make new df based on unifrac, matched to same order as unifrac match the
# order to colnames(unifrac), rewrite over same name
otu.comm <- otu.comm[match(colnames(unifrac), otu.comm$SampleID), ]
# this says - for the community df, order the rows to match the order of the
# column names in unifrac, that match to SampleID

# reorder mapping data to community data
grps <- map6[match(colnames(unifrac), map6$SampleID), ]
# now that all the data (community, distances, grouping variables) are
# ordered the same, we can use permanova etc

Permanova

Permanova tests for differences in composition among groups
Reminder - permanova is always based on pairwise distances/dissimilarities.

set.seed(304)
# unifrac distances
str(grps)  #look at data types to indetify error source
## 'data.frame':    167 obs. of  10 variables:
##  $ SampleID            : Factor w/ 248 levels "101","103","109",..: 162 93 88 86 151 164 153 154 99 97 ...
##  $ BarcodeSequence     : logi  NA NA NA NA NA NA ...
##  $ LinkerPrimerSequence: logi  NA NA NA NA NA NA ...
##  $ Year                : chr  "2016" "2016" "2016" "2016" ...
##  $ Type                : Factor w/ 5 levels "A","B","C","N",..: 2 2 1 2 2 2 2 2 2 1 ...
##  $ Genus               : Factor w/ 5 levels "Ave","Bra","Bro",..: 3 2 3 3 3 3 2 2 3 2 ...
##  $ Species             : Factor w/ 7 levels "BD","BH","BM",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ Site                : Factor w/ 14 levels "Alg","Ali","BIER",..: 4 7 7 7 5 4 5 5 7 7 ...
##  $ Country             : Factor w/ 2 levels "Spain","USA": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Description         : Factor w/ 4 levels "P11","PN2","SP10",..: 3 3 4 3 3 3 3 3 3 4 ...
# Elevation is integer - change to factor
grps$Type <- as.factor(grps$Type)
grps$Genus <- as.factor(grps$Genus)

# ad.uni<-adonis2(unifrac~Type+Genus, data=grps, permutations=1000,
# strata='Country')

# ad.uni<-adonis2(unifrac~Type+Genus+Country, data=grps, permutations=1000)
# ad.uni

# unifrac distances with month nested in year
# ad.uniNest<-adonis2(unifrac~Genus/Species+Type, data=grps,
# permutations=1000) ad.uniNest

# unifrac distances weighted by relative abundance
# ad.uniwt<-adonis2(unifrac_wt~Genus+Type+Country, data=grps,
# permutations=1000) ad.uniwt

Manipulate code for test of interest

NMDS

The NMDS often pairs well with the PERMANOVA because both are more robust to properties of community data. To gain proper fit may need to try multiple dissimilarity methods or tranformations.

## Run 0 stress 0.1172938 
## Run 1 stress 0.1163358 
## ... New best solution
## ... Procrustes: rmse 0.04494324  max resid 0.2454522 
## Run 2 stress 0.1421087 
## Run 3 stress 0.1349808 
## Run 4 stress 0.1479205 
## Run 5 stress 0.1320926 
## Run 6 stress 0.129599 
## Run 7 stress 0.1449312 
## Run 8 stress 0.1273764 
## Run 9 stress 0.1215013 
## Run 10 stress 0.1439875 
## Run 11 stress 0.1432548 
## Run 12 stress 0.1277929 
## Run 13 stress 0.1396788 
## Run 14 stress 0.1479858 
## Run 15 stress 0.1368881 
## Run 16 stress 0.1266009 
## Run 17 stress 0.1246037 
## Run 18 stress 0.1389173 
## Run 19 stress 0.1146894 
## ... New best solution
## ... Procrustes: rmse 0.02332448  max resid 0.2396804 
## Run 20 stress 0.1299571 
## *** No convergence -- monoMDS stopping criteria:
##     20: stress ratio > sratmax
## Error in metaMDS(unifrac_wt, k = 2, autotransform = FALSE): object 'unifrac_wt' not found

Plot ordination

# get NMDS scores for each SampleID scores are returned for each sampleid as
# a row, collapses dissimilarity data into 2 dimensions
nmds.uni <- data.frame(scores(mds.uni))
nmds.uni <- nmds.uni[match(grps$SampleID, rownames(nmds.uni)), ] %>% cbind(grps)  # match the rows and bind with the grouping data for plotting

# plotting data for nmds for weighted unifrac
nmds.uniwt <- data.frame(scores(mds.uniwt))
## Error in scores(mds.uniwt): object 'mds.uniwt' not found
nmds.uniwt <- nmds.uniwt[match(grps$SampleID, rownames(nmds.uniwt)), ] %>% cbind(grps)  # reorder rows and bind to grouping data
## Error in eval(lhs, parent, parent): object 'nmds.uniwt' not found
ggplot(nmds.uni, aes(NMDS1, NMDS2)) + geom_point(aes(color = Type, shape = Genus))

ggplot(nmds.uni, aes(NMDS1, NMDS2)) + geom_point(aes(color = Site, shape = Description))

```

ggplot(nmds.uni, aes(NMDS1, NMDS2)) + geom_point(aes(color = Genus, shape = Country))

ggplot(nmds.uni, aes(NMDS1, NMDS2)) + geom_point(aes(color = Genus, shape = Country))

```

Manipulate plot code to make figure as desired

# Not working to assign Year as a grouping factor
ggplot(nmds.uni, aes(NMDS1, NMDS2)) + geom_point(aes(size = Type, shape = Genus))
## Error in grid.Call.graphics(C_setviewport, vp, TRUE): non-finite location and/or size for viewport

# converting variables between numeric and factors changes how ggplot will
# plot them if hitting errors
str(nmds.uni$Genus)
##  Factor w/ 5 levels "Ave","Bra","Bro",..: 3 2 3 3 3 3 2 2 3 2 ...
nmds.uni <- nmds.uni %>% transform(Type = as.factor(Type))

ggplot(nmds.uni, aes(NMDS1, NMDS2)) + geom_point(aes(size = Type, shape = Site))
## Error in grid.Call.graphics(C_setviewport, vp, TRUE): non-finite location and/or size for viewport
# Not working to assign Year as a grouping factor
ggplot(nmds.uni, aes(NMDS1, NMDS2)) + geom_point(aes(size = Type, color = Description))
## Error in grid.Call.graphics(C_setviewport, vp, TRUE): non-finite location and/or size for viewport

# size for discrete scale is a little funky also there are 2 different
# levels for the month of october due to spelling differences - correction
# added upstream to the grouping dataframe
# Not running
ggplot(nmds.uni, aes(NMDS1, NMDS2)) + geom_point(aes(color = Site, shape = Type), 
    size = 3) + scale_color_gradient2(low = "turquoise", mid = "yellow", high = "red", 
    midpoint = 1500)
## Error: Discrete value supplied to continuous scale
# to use scale_color_gradient2, the assigned variable must be continuous
# (as.numeric) a shape scale is most useful for categorical data data type
# can be: numeric, factor, integer, and character. character and factor are
# very similar but are treated slightly differently

str(nmds.uni$Elevation)
##  NULL
# to convert to numeric back from factor you have to be careful going
# directly factor to numeric does not work. to preserve the values in a
# factor, convert to character, then numeric
nmds.uni$Elevation <- as.character(nmds.uni$Elevation)
## Error in `$<-.data.frame`(`*tmp*`, Elevation, value = character(0)): replacement has 0 rows, data has 167
nmds.uni$Elevation <- as.numeric(nmds.uni$Elevation)
## Error in `$<-.data.frame`(`*tmp*`, Elevation, value = numeric(0)): replacement has 0 rows, data has 167
ggplot(nmds.uni, aes(NMDS1, NMDS2)) + geom_point(aes(color = Elevation, shape = Site), 
    size = 3) + scale_color_gradient2(low = "turquoise", mid = "yellow", high = "red", 
    midpoint = 1500)
## Error in FUN(X[[i]], ...): object 'Elevation' not found

Separates well by elevation/site, within each elevation the months are grouped together for the most part.

ggplot(nmds.uniwt, aes(NMDS1, NMDS2)) + geom_point(aes(color = Year, shape = Month), 
    size = 4) + scale_color_gradient2(low = "turquoise", high = "red", midpoint = 1500)
## Error in ggplot(nmds.uniwt, aes(NMDS1, NMDS2)): object 'nmds.uniwt' not found
# Not running - you need to change Year to categorical using as.factor()
## edit shape scale - the + is hard to see
## https://ggplot2.tidyverse.org/reference/scale_shape.html

# Show a list of available shapes
df_shapes <- data.frame(shape = 0:25)
ggplot(df_shapes, aes(0, 0, shape = shape)) + geom_point(aes(shape = shape), 
    size = 5, fill = "red") + scale_shape_identity() + facet_wrap(~shape) + 
    theme_void()

# shapes in red let you manipulate the color & fill (rather than just color)
# color then corresponds to the shape outline, and fill to the inside let's
# use the fill to indicate the Year, shape for the Month, and color for
# Elevation
# let's use the fill to indicate the Year, shape for the Month

# color = Elevation
ggplot(nmds.uni, aes(NMDS1, NMDS2))+
  geom_point(aes(color=Elevation), size=3)+
  scale_color_gradient2(low='turquoise', mid='yellow', high='red', midpoint=1500)+
  theme_bw()
## Error in FUN(X[[i]], ...): object 'Elevation' not found
# fill = Year
ggplot(nmds.uni, aes(NMDS1, NMDS2))+
  geom_point(aes(color=Elevation, fill=Year), size=3)+
  scale_color_gradient2(low='turquoise', mid='yellow', high='red', midpoint=1500)+
  theme_bw()
## Error in FUN(X[[i]], ...): object 'Elevation' not found
# note - this is the same as before even with fill = Year added because we are using a shape that cannot be filled

# fill = Year, shape = 21
ggplot(nmds.uni, aes(NMDS1, NMDS2))+
  geom_point(aes(color=Elevation, fill=Year), shape = 21,color='black', size=3)+
  scale_color_gradient2(low='turquoise', mid='yellow', high='red', midpoint=1500)+
  theme_bw()

# shape = 21 is the filled circle

# shape = Month
ggplot(nmds.uni, aes(NMDS1, NMDS2))+
  geom_point(aes(color=Elevation, fill=Year, shape=Month),color='black', size=3)+
  scale_color_gradient2(low='turquoise', mid='yellow', high='red', midpoint=1500)+
  scale_shape_manual(values=c(21, 22, 24))+ # the values correspond to the shapes above
  theme_bw()
## Error in FUN(X[[i]], ...): object 'Month' not found
# and to change the color of the fill:
ggplot(nmds.uni, aes(NMDS1, NMDS2))+
  geom_point(aes(color=Elevation, fill=Year), shape = 21,color='black', size=3)+
  scale_color_gradient2(low='turquoise', mid='yellow', high='red', midpoint=1500)+
  scale_shape_manual(values=c(21, 22, 24))+ # the values correspond to the shapes above
  scale_fill_manual(values=c('black','white'))+
  theme_bw()

# lets add the stress to the plot
mds.uni$stress #the stress
## [1] 0.1146894
ggplot(nmds.uni, aes(NMDS1, NMDS2))+
  geom_point(aes(color=Elevation, fill=Year, shape=Month),color='black', size=3)+
  scale_color_gradient2(low='turquoise', mid='yellow', high='red', midpoint=1500)+
  scale_shape_manual(values=c(21, 22, 24))+ # the values correspond to the shapes above
  scale_fill_manual(values=c('black','white'))+
  theme_bw()+
  annotate('text', x=0.3, -0.25, label=paste0('Stress = ',round(mds.uni$stress,2)))
## Error in FUN(X[[i]], ...): object 'Month' not found

NMDS vs PCoA

NMDS is an unconstrained ordination method to visualize multivariate data in fewer dimensions. Depending on the properties of your data and your questions, different methods of ordination may be appropriate. PCoA (or MDS) is a metric version of NMDS, meaning that PCoA is a Euclidean representation of dissimilarities. So, like NMDS PCoA uses dissimilarities, but it is euclidean rather than rank-order which is used to preserve distances hierarchically. NMDS and distance-based ordinations/tests are often easier to use with community data because 0’s and 1’s, rare species are common and do not configure in euclidean space.

In general, ordination & multivaraite methods are defined by (1) whether or not using dissimilarity, (2) whether or not euclidean, (3) test for differences vs explaining differences …

PCoA, like PCA, returns a set of orthogonal axes with associated eignevalues that measure their importance. PCoA and PCA should produce the same results if the PCA is calculated on a covariance matrix with scaling of 1.

plot the pcoa

Q2. If differences exist, are there particular taxa that contribute disproportionately to those differences?

Note: Can’t run any analyses that rely on relative abundance data (e.g., simper), Address using absence-presence data

Outputs:

1. Indicator species analysis to identify taxon-habitat association patterns.

This procedure identifies OTUs as indicator species independently from their abundance in the total data set. ####Steps (from https://www.nature.com/articles/ismej2015238):

  1. single- and doubleton OTUs are removed as they hold little indicator informationusing the multipatt function (number of permutations=9999) implemented in the indicspecies R package (De Cáceres et al., 2010).
  2. To account for multiple testing, P-values were corrected by calculating false discovery rates (q-values) with the q-value function implemented in the BiocLite R package (Dabney and Storney, 2014). Indicator OTUs with q<0.05 were considered significant.
  3. Indicator taxa were represented in bipartite networks by using the edge-weighted spring embedded algorithm layout implemented in Cytoscape v.3.0.2 (Shannon et al., 2003) where point biserial correlation values, a measure of species–habitat association, were used to weight the edges between nodes constituting the habitats and indicator OTUs (Hartmann et al., 2015).
  4. We further mapped these indicator OTUs on taxonomic networks generated in Cytoscape v.3.0.2 to investigate potential taxa–habitat associations (Hartmann et al., 2015). I
  5. Indicator OTUs classified at the genus level were displayed in a taxonomic tree generated in iTOL (Letunic and Bork, 2011) together with the positive point biserial correlation values associated with each habitat.
  6. To find patterns of co-occurrence among OTUs characteristic of the habitats studied, we analysed correlations among the relative abundances of all indicator OTUs (co-correlations) by calculating Spearman rank correlation values (Spearman, 1904) in R with the function corr.test implemented in the package psych. Multiple testing was accounted for by correcting P-values with a false discovery rates of 5% (q-value<0.05).
  7. Bacterial and fungal indicator OTUs that were significantly co-correlated were displayed in networks with the edge-weighted spring embedded algorithm layout implemented in Cytoscape where Spearman correlation values were used to weight the edges between the nodes representing the OTUs (Hartmann et al., 2015).”

Q3. Can the observed variation in XXX be explained by provenance? By nutrient composition?

Outputs:

1. Multiple regression on distance matrix