data wrangling

Prep data for multivariate anaylsis

otu.taxa<-read.csv('data/OTU_taxa_id.csv') # otu ids
otu.comm<-read.csv('data/OTU_community_clean.csv')

#otu.grp<-otu.comm%>%left_join(map6, by='SampleID')

## the samples in the unifrac dist and community need to be in the same order
# need to order community based on unifraq
colnames(unifrac) # use this order

##  [1] "OS115" "AP114" "JJ415" "OT115" "JJ515" "OS615" "OJ615" "JT415"
##  [9] "JP315" "OJ514" "OJ515" "OJ415" "AS114" "JS415" "JP414" "OT515"
## [17] "JJ114" "OT415" "OT414" "JP515" "JT115" "AJ414" "OP315" "OS214"
## [25] "JT314" "OP515" "AT414" "OS415" "OP615" "OP614" "JS115"

# make new df based on unifrac, matched to same order as unifrac
otu.ordered<-data.frame(SampleID = colnames(unifrac))%>%left_join(otu.comm, by='SampleID')
otu.ordered$SampleID

##  [1] OS115 AP114 JJ415 OT115 JJ515 OS615 OJ615 JT415 JP315 OJ514 OJ515
## [12] OJ415 AS114 JS415 JP414 OT515 JJ114 OT415 OT414 JP515 JT115 AJ414
## [23] OP315 OS214 JT314 OP515 AT414 OS415 OP615 OP614 JS115
## 31 Levels: AJ414 AP114 AS114 AT414 JJ114 JJ415 JJ515 JP315 JP414 ... OT515

# this works but is inefficient - there are multiple copies of the otu table which is large, remember to remove this file after data wrangling to free up workspace on Rstudio (rm command works to remove files)

# alternatively - match the order, rewrite over same name
otu.comm<-otu.comm[match(colnames(unifrac), otu.comm$SampleID),]
# this says - for the community df, order the rows to match the order of the column names in the file with the unifrac dissimilarity matrix i.e., unifrac, that match to SampleID
rm(otu.ordered)

Combine with grouping variables

#combine with grouping variables - align mapping data with community data
otu.grp<-otu.comm%>%left_join(map6, by='SampleID')

#instead of making the df larger, we can reorder the mapping data to match with the community, like above
grps<-map6[match(colnames(unifrac), map6$SampleID),]
grps$SampleID

##  [1] OS115 AP114 JJ415 OT115 JJ515 OS615 OJ615 JT415 JP315 OJ514 OJ515
## [12] OJ415 AS114 JS415 JP414 OT515 JJ114 OT415 OT414 JP515 JT115 AJ414
## [23] OP315 OS214 JT314 OP515 AT414 OS415 OP615 OP614 JS115
## 31 Levels: AJ414 AP114 AS114 AT414 JJ114 JJ415 JJ515 JP315 JP414 ... OT515

# now that all the data (community, distances, grouping variables) are ordered the same, we can use permanova etc

Permanova

Permanova tests for differences in composition among groups
Reminder - permanova is always based on pairwise distances/dissimilarities.
Can either nest Month in Year with Year/Month Or, can restrict permutations within year, using strata=‘Year’

set.seed(304)

#unifrac distances
ad.uni<-adonis2(unifrac~Month+Elevation, data=grps, permutations=1000, strata='Year')
ad.uni

## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 1000
## 
## adonis2(formula = unifrac ~ Month + Elevation, data = grps, permutations = 1000, strata = "Year")
##           Df SumOfSqs      R2      F   Pr(>F)    
## Month      3   1.1164 0.13439 1.5389 0.002997 ** 
## Elevation  1   0.9037 0.10878 3.7372 0.000999 ***
## Residual  26   6.2873 0.75683                    
## Total     30   8.3074 1.00000                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#unifrac distances with month nested in year
ad.uniNest<-adonis2(unifrac~Year/Month+Elevation, data=grps, permutations=1000)
ad.uniNest

## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 1000
## 
## adonis2(formula = unifrac ~ Year/Month + Elevation, data = grps, permutations = 1000)
##            Df SumOfSqs      R2      F   Pr(>F)    
## Year        1   0.5142 0.06190 2.1383 0.002997 ** 
## Elevation   1   0.8993 0.10825 3.7396 0.000999 ***
## Year:Month  3   0.8822 0.10620 1.2229 0.058941 .  
## Residual   25   6.0117 0.72366                    
## Total      30   8.3074 1.00000                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#unifrac distances weighted by relative abundance
ad.uniwt<-adonis2(unifrac_wt~Month+Year+Elevation, data=grps, permutations=1000)
ad.uniwt

## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 1000
## 
## adonis2(formula = unifrac_wt ~ Month + Year + Elevation, data = grps, permutations = 1000)
##           Df SumOfSqs      R2      F   Pr(>F)    
## Month      3   0.4880 0.14798 1.9345 0.020979 *  
## Year       1   0.1288 0.03907 1.5320 0.144855    
## Elevation  1   0.5787 0.17548 6.8818 0.000999 ***
## Residual  25   2.1022 0.63747                    
## Total     30   3.2976 1.00000                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## are the results the same with other (non evolutionary) dissimiarlity indices?
dist.j<-vegdist(otu.comm[,-1], method='jaccard', na.rm=TRUE)
dist.bc<-vegdist(otu.comm[,-1], method='bray', na.rm=TRUE)

ad.bc<-adonis2(dist.bc~Month+Year+Elevation, data=grps, permutations=1000, strata='Year')
ad.bc

## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 1000
## 
## adonis2(formula = dist.bc ~ Month + Year + Elevation, data = grps, permutations = 1000, strata = "Year")
##           Df SumOfSqs      R2      F   Pr(>F)    
## Month      3   1.2059 0.13955 1.6232 0.003996 ** 
## Year       1   0.3061 0.03542 1.2359 0.250749    
## Elevation  1   0.9385 0.10860 3.7897 0.000999 ***
## Residual  25   6.1913 0.71643                    
## Total     30   8.6418 1.00000                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ad.j<-adonis2(dist.j~Month+Year+Elevation, data=grps, permutations=1000)
ad.j

## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 1000
## 
## adonis2(formula = dist.j ~ Month + Year + Elevation, data = grps, permutations = 1000)
##           Df SumOfSqs      R2      F   Pr(>F)    
## Month      3   1.3975 0.12712 1.3982 0.005994 ** 
## Year       1   0.3881 0.03530 1.1649 0.191808    
## Elevation  1   0.8788 0.07994 2.6378 0.000999 ***
## Residual  25   8.3289 0.75764                    
## Total     30  10.9933 1.00000                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Month and elevation are significant in all tests. Year is not but that is probably because month is nested within year.

Include Year in model appropriately
Or alternatively, if wanting to constrain for the variation by Year, constrain permutations within Year using ‘strata’
#### NMDS
The NMDS often pairs well with the PERMANOVA because both are more robust to properties of community data. To gain proper fit may need to try multiple dissimilarity methods or tranformations.

Plot ordination

ggplot(nmds.uni, aes(NMDS1, NMDS2))+
  geom_point(aes(color=Elevation, shape=Month))

Plotting elevation, month, and site

ggplot(nmds.uni, aes(NMDS1, NMDS2))+
  geom_point(aes(color=Elevation, shape=Site), size=3)+
  scale_color_gradient2(low='turquoise', mid='yellow', high='red', midpoint=1500)

ggplot(nmds.uniwt, aes(NMDS1, NMDS2))+
  geom_point(aes(color=Elevation, shape=Site), size=4)+
  scale_color_gradient2(low='turquoise', mid='yellow', high='red', midpoint=1500)

Separates well by elevation/site, within each elevation the months are grouped together for the most part.

NMDS vs PCoA

NMDS is an unconstrained ordination method to visualize multivariate data in fewer dimensions. Depending on the properties of your data and your questions, different methods of ordination may be appropriate. PCoA (or MDS) is a metric version of NMDS, meaning that PCoA is a Euclidean representation of dissimilarities. So, like NMDS PCoA uses dissimilarities, but it is euclidean rather than rank-order which is used to preserve distances hierarchically. NMDS and distance-based ordinations/tests are often easier to use with community data because 0’s and 1’s, rare species are common and do not configure in euclidean space.

In general, ordination & multivaraite methods are defined by (1) whether or not using dissimilarity, (2) whether or not euclidean, (3) test for differences vs explaining differences …

PCoA, like PCA, returns a set of orthogonal axes with associated eignevalues that measure their importance. PCoA and PCA should produce the same results if the PCA is calculated on a covariance matrix with scaling of 1.

#PCoA - unconstrained, euclidean
pcoa.uni<-capscale(unifrac~1,data=grps)
head(summary(pcoa.uni))

## 
## Call:
## capscale(formula = unifrac ~ 1, data = grps) 
## 
## Partitioning of squared Unknown distance:
##               Inertia Proportion
## Total           8.307          1
## Unconstrained   8.307          1
## 
## Eigenvalues, and their contribution to the squared Unknown distance 
## 
## Importance of components:
##                         MDS1   MDS2    MDS3    MDS4    MDS5    MDS6
## Eigenvalue            1.3712 0.6172 0.48296 0.44120 0.37507 0.33195
## Proportion Explained  0.1651 0.0743 0.05814 0.05311 0.04515 0.03996
## Cumulative Proportion 0.1651 0.2394 0.29750 0.35061 0.39575 0.43571
##                          MDS7    MDS8    MDS9   MDS10   MDS11   MDS12
## Eigenvalue            0.33155 0.28128 0.26035 0.24962 0.23306 0.21707
## Proportion Explained  0.03991 0.03386 0.03134 0.03005 0.02805 0.02613
## Cumulative Proportion 0.47562 0.50948 0.54082 0.57087 0.59892 0.62505
##                         MDS13  MDS14   MDS15   MDS16   MDS17   MDS18
## Eigenvalue            0.21094 0.2044 0.19919 0.19491 0.19216 0.18599
## Proportion Explained  0.02539 0.0246 0.02398 0.02346 0.02313 0.02239
## Cumulative Proportion 0.65044 0.6750 0.69902 0.72248 0.74561 0.76800
##                         MDS19   MDS20   MDS21   MDS22   MDS23   MDS24
## Eigenvalue            0.18212 0.17951 0.17740 0.17325 0.16256 0.15928
## Proportion Explained  0.02192 0.02161 0.02135 0.02085 0.01957 0.01917
## Cumulative Proportion 0.78992 0.81153 0.83289 0.85374 0.87331 0.89248
##                         MDS25   MDS26   MDS27   MDS28   MDS29   MDS30
## Eigenvalue            0.15552 0.15339 0.15087 0.14823 0.14353 0.14165
## Proportion Explained  0.01872 0.01846 0.01816 0.01784 0.01728 0.01705
## Cumulative Proportion 0.91120 0.92967 0.94783 0.96567 0.98295 1.00000
## 
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores:  3.973259 
## 
## 
## Species scores
## 
##      MDS1 MDS2 MDS3 MDS4 MDS5 MDS6
## Dim1                              
## Dim2                              
## Dim3                              
## Dim4                              
## Dim5                              
## Dim6                              
## ....                              
## 
## 
## Site scores (weighted sums of species scores)
## 
##          MDS1    MDS2     MDS3    MDS4     MDS5    MDS6
## OS115 -0.7415  0.1656  0.04005 -0.4076 -1.28724  1.7182
## AP114  0.6906  0.6814  0.71376 -0.5067 -0.02969 -0.6876
## JJ415  0.9565 -0.3752  0.54479  0.7769 -0.34641 -0.2737
## OT115 -0.1568 -1.3120 -0.73738 -0.9420 -0.16122 -0.6079
## JJ515  1.1682 -0.2464  0.84516  0.7170 -0.55189 -0.2705
## OS615 -0.8800 -0.3348  0.22812  0.4617 -1.03511  0.1340
## ....

pcoa.df<-data.frame(scores(pcoa.uni, display='sites'))
pcoa.df# site scores correspond to the SampleID

##              MDS1        MDS2
## OS115 -0.74150784  0.16559647
## AP114  0.69057917  0.68140546
## JJ415  0.95654989 -0.37523883
## OT115 -0.15679547 -1.31203771
## JJ515  1.16819586 -0.24636897
## OS615 -0.88000154 -0.33482448
## OJ615  0.74929599 -0.63299584
## JT415  0.72507563  0.29416305
## JP315 -0.67566941  0.31608512
## OJ514  0.87871285 -0.12487936
## OJ515  0.57853130 -1.12364192
## OJ415  0.48488862 -1.09386822
## AS114  0.02490994  0.91198409
## JS415 -0.98065675  0.39498848
## JP414  0.09809453  1.08984841
## OT515 -0.30571689 -1.45939703
## JJ114  1.05797322 -0.16478383
## OT415 -0.37346462 -1.42360670
## OT414  0.44985751  0.80011034
## JP515 -0.74251220  0.40269122
## JT115  0.80663646  0.47811206
## AJ414 -0.60026037  0.01492331
## OP315 -0.88894794 -0.30454442
## OS214  0.03104503  1.08472510
## JT314  0.71373399  0.72165153
## OP515 -0.79264025 -0.30799797
## AT414  0.72371431  0.46458444
## OS415 -0.70002762  0.25432307
## OP615 -0.77049900 -0.20989577
## OP614 -0.73749279  0.34824823
## JS115 -0.79160161  0.69064067

# bind with grouping data, in order
pcoa.df<-pcoa.df[match(grps$SampleID,rownames(pcoa.df)),]%>%cbind(grps)

Q2. If differences exist, are there particular taxa that contribute disproportionately to those differences?

Note: Can’t run any analyses that rely on relative abundance data (e.g., simper), Address using absence-presence data

Outputs:

1. Indicator species analysis to identify taxon-habitat association patterns.

This procedure identifies OTUs as indicator species independently from their abundance in the total data set. ####Steps (from https://www.nature.com/articles/ismej2015238):

single- and doubleton OTUs are removed as they hold little indicator informationusing the multipatt function (number of permutations=9999) implemented in the indicspecies R package (De Cáceres et al., 2010).
To account for multiple testing, P-values were corrected by calculating false discovery rates (q-values) with the q-value function implemented in the BiocLite R package (Dabney and Storney, 2014). Indicator OTUs with q<0.05 were considered significant.
Indicator taxa were represented in bipartite networks by using the edge-weighted spring embedded algorithm layout implemented in Cytoscape v.3.0.2 (Shannon et al., 2003) where point biserial correlation values, a measure of species–habitat association, were used to weight the edges between nodes constituting the habitats and indicator OTUs (Hartmann et al., 2015).
We further mapped these indicator OTUs on taxonomic networks generated in Cytoscape v.3.0.2 to investigate potential taxa–habitat associations (Hartmann et al., 2015). I
Indicator OTUs classified at the genus level were displayed in a taxonomic tree generated in iTOL (Letunic and Bork, 2011) together with the positive point biserial correlation values associated with each habitat.
To find patterns of co-occurrence among OTUs characteristic of the habitats studied, we analysed correlations among the relative abundances of all indicator OTUs (co-correlations) by calculating Spearman rank correlation values (Spearman, 1904) in R with the function corr.test implemented in the package psych. Multiple testing was accounted for by correcting P-values with a false discovery rates of 5% (q-value<0.05).
Bacterial and fungal indicator OTUs that were significantly co-correlated were displayed in networks with the edge-weighted spring embedded algorithm layout implemented in Cytoscape where Spearman correlation values were used to weight the edges between the nodes representing the OTUs (Hartmann et al., 2015).”

DustFigs20180731

MiaMaltz

7/31/2018

Logic

Q1. Do dust-associated microbial communities entering the Sierra Nevada differ across space and time?

Outputs:

1. Determine the number of new OTUs identified at each site per sampling event

2. NMDS of unweighted unifrac or jaccard dissimilarity. PERMANOVA with PERMDISP.

Prepare data

Read in UniFrac distances

data wrangling

Prep data for multivariate anaylsis

Combine with grouping variables

Permanova

NMDS vs PCoA

Q2. If differences exist, are there particular taxa that contribute disproportionately to those differences?

Outputs:

1. Indicator species analysis to identify taxon-habitat association patterns.

Q3. Can the observed variation in XXX be explained by provenance? By nutrient composition?

Outputs:

1. Multiple regression on distance matrix