Logic

Background: Mangrove forests, pantropical coastal wetland ecosystems, provide important ecosystem services including fisheries nursery habitat provision, coastline protection, nutrient cycling, and other local benefits. Of global import is their ability to store carbon in organic peat belowground for hundreds to thousands of years, with more carbon per unit area than any other forest type. Mangrove carbon storage relies in part on the high primary productivity of these systems, but essential to the large and long-lived nature of this storage is the slow microbial decomposition of buried mangrove peat. In this study, we explore the relationship between carbon content and microbial community composition with sediment age and plot the slow course of buried mangrove detritus over time. At four mangrove sites with deep (< 1 m) deposits of peat in the area of La Paz, B.C.S., Mexico, we cored the sediments until rejection with a Russian peat corer, and from these cores obtained 5 cm samples at 20 cm intervals. In these samples we measured total carbon, organic carbon, nitrogen, and 14C age. We observed high percentage carbon by mass (14 ± 6%) and high C:N ratios (29 ± 7) in peat samples. Radiocarbon dates allowed us to reconstruct the accumulation and slow decomposition of organic matter over the last 5,000 years. Limitations on microbial decomposition by microbes, likely help explain this slow decomposition. [Summary of microbial results.] These results shed light on the microbial environment in which these peat deposits are preserved for long periods of time. Future work should examine the roles of distinct microbial taxa in the slow turnover of mangrove carbon. Mangrove forests, long considered detritus-based ecosystems, can only be understood when these belowground carbon cycling processes are captured. Furthermore, data on patterns of belowground carbon in these threatened systems can motivate their conservation, given the value of their ecosystem service of carbon storage, estimated to be worth on the order of 1 billion US$ in the Gulf of California’s mangroves alone.

**Q1. Does sediment from deeper, older peat samples contain lower concentrations of organic carbon and enrichment in δ13C due to microbial consumption over time.

Q2. Will peat age be correlated with a shift in the microbial community toward taxa that can consume this refractory remaining carbon and away from the community composition near the sediment surface with greater affinity to marine and terrestrial soil communities.

Outputs:

1. Determine the number of new OTUs identified at each depth per sampling event

2. NMDS of unweighted unifrac or jaccard dissimilarity. PERMANOVA with PERMDISP.

getwd()
## [1] "/Users/maltz/Documents/GitHub/MangroveSeqs2018"

Prepare data

Split taxonomy column into multiple variables

# bacteria otu
B16S<-read.csv("data/filtered_table_w_metadata.csv", header = TRUE, stringsAsFactors = FALSE)

# split taxa groupings into new columns
source('dust_functions.R')
B16S<-split_taxa(B16S)

Read in UniFrac distances

UniFrac is a measure of B-diversity that uses phylogenetic information to compare community samples. Use with multivariate methods like principal coordinates (PCoA), NMDS, etc to explain differences among communities. Measured as the amount of evolutionary history that is unique to each group - the fraction of branch length in a phylogenetic tree associated with ONLY the focal sample in pairwise comparison. With sequence data UniFrac values can be invfluencd by the number of sequences/sample and can use sequence jackknifing (Lozupone et al 2011. ISME).
Use to compare phylogenetic lineages between groups, to cluster samples.

unifrac<-read.table('data/unweighted_unifrac_dm.txt')
unifrac_wt<-read.table('data/weighted_unifrac_dm.txt')

Use heatmap to visualize sample dissimilarity based on UniFrac

Prep data for multivariate anaylsis

otu.taxa<-read.csv('data/OTU_taxa_id.csv') # otu ids
otu.comm<-read.csv('data/OTU_community_clean.csv')

#otu.grp<-otu.comm%>%left_join(map6, by='SampleID')
## the samples in the unifrq dist and community need to be in the same order
# need to order community based on unifraq
colnames(unifrac) # use this order
##  [1] "X24" "X25" "X26" "X27" "X20" "X21" "X22" "X23" "X28" "X29" "X1" 
## [12] "X3"  "X2"  "X5"  "X4"  "X7"  "X6"  "X9"  "X8"  "X11" "X10" "X13"
## [23] "X12" "X15" "X14" "X17" "X16" "X19" "X18" "X31" "X30" "X33" "X32"
# make new df based on unifraq
otu.ordered<-data.frame(SampleID = colnames(unifrac))%>%left_join(otu.comm, by='SampleID')
otu.ordered$SampleID
##  [1] X24 X25 X26 X27 X20 X21 X22 X23 X28 X29 X1  X3  X2  X5  X4  X7  X6 
## [18] X9  X8  X11 X10 X13 X12 X15 X14 X17 X16 X19 X18 X31 X30 X33 X32
## 33 Levels: X1 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X2 X20 X21 ... X9
# this works but is inefficient - there are multiple copies of the otu table which is large

# alternatively - match the order, rewrite over same name
otu.comm<-otu.comm[match(colnames(unifrac), otu.comm$SampleID),]
# this says - for the community df, order the rows to match the order of the column names in unifrac, that match to SampleID
rm(otu.ordered)
#combine with grouping variables - align mapping data with community data
#View(otu.comm)
str(otu.comm)
## 'data.frame':    33 obs. of  28594 variables:
##  $ SampleID                      : Factor w/ 33 levels "X1","X10","X11",..: 17 18 19 20 13 14 15 16 21 22 ...
##  $ X100167                       : int  NA 6 NA NA NA NA NA NA NA 3 ...
##  $ X1003755                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1007180                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1011088                      : int  6 6 NA 5 NA NA NA 5 NA 6 ...
##  $ X1011502                      : int  NA 4 NA NA NA NA NA NA NA NA ...
##  $ X1011514                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1012668                      : int  NA NA NA NA 3 3 NA NA NA NA ...
##  $ X1014492                      : int  NA 7 NA NA 9 NA NA 5 NA 5 ...
##  $ X1016465                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X101654                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1019161                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X101967                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1021233                      : int  NA NA NA NA 7 NA NA NA NA NA ...
##  $ X102272                       : int  NA NA NA NA 7 NA NA NA NA NA ...
##  $ X1024772                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X102556                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1025949                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X102653                       : int  NA NA NA NA 3 NA NA 3 NA 5 ...
##  $ X1027649                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1027802                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X102886                       : int  NA NA NA NA 18 NA NA NA NA 3 ...
##  $ X103408                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1035774                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1038403                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X103868                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1040290                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X104068                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1041464                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1044436                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1045589                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1045667                      : int  NA NA NA NA 44 17 NA NA NA 5 ...
##  $ X104681                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1050284                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1051517                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X105224                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1053775                      : int  NA NA NA 6 NA NA NA NA NA NA ...
##  $ X1054222                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1056339                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X106027                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1062089                      : int  15 86 10 15 456 98 7 54 7 165 ...
##  $ X1062748                      : int  NA NA NA NA 5 NA NA 4 NA NA ...
##  $ X1067447                      : int  5 NA NA NA NA NA NA NA NA NA ...
##  $ X106754                       : int  NA NA 5 NA NA NA NA NA NA NA ...
##  $ X10681                        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X106877                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X10691                        : int  NA NA NA NA 5 NA NA NA NA 3 ...
##  $ X1070610                      : int  NA NA NA NA 9 5 NA NA NA 3 ...
##  $ X1070749                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1071670                      : int  NA NA NA NA NA NA NA NA NA 4 ...
##  $ X1074625                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1076709                      : int  NA 6 NA NA 10 NA NA 3 NA 13 ...
##  $ X1078881                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1079938                      : int  NA NA NA NA NA NA NA 5 NA NA ...
##  $ X1082059                      : int  6 27 NA NA 19 NA 6 26 NA 17 ...
##  $ X1082294                      : int  NA NA NA NA NA NA NA NA 3 NA ...
##  $ X108405                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X108426                       : int  NA NA NA NA 4 NA NA NA NA NA ...
##  $ X1084486                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1084931                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1088265                      : int  NA NA 4 8 NA NA NA NA 5 14 ...
##  $ X1088391                      : int  8 15 NA 3 10 NA NA 3 NA 5 ...
##  $ X109060                       : int  NA NA NA NA NA NA 3 NA NA NA ...
##  $ X10923                        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X109236                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X10926                        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X109279                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X109288                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X109299                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X109302                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1093382                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X10940                        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1095353                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1096010                      : int  NA NA NA NA NA NA NA NA NA 5 ...
##  $ X1098922                      : int  NA NA NA NA 6 NA NA NA NA NA ...
##  $ X1101451                      : int  NA NA NA NA NA NA NA NA NA 3 ...
##  $ X1104662                      : int  5 4 3 NA NA NA NA 56 8 16 ...
##  $ X1104672                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1104691                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1104845                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1104911                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1105022                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X110514                       : int  NA NA NA NA 3 NA NA NA NA 12 ...
##  $ X1105269                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1105285                      : int  8 28 7 5 204 53 3 29 NA 30 ...
##  $ X1105509                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1105584                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1105884                      : int  NA NA NA NA NA NA 10 NA 6 3 ...
##  $ X1105927                      : int  4 16 NA NA 106 24 NA 8 NA 10 ...
##  $ X1106016                      : int  3 25 NA 15 271 73 3 20 NA 41 ...
##  $ X1106049                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1106130                      : int  NA NA NA NA 3 NA NA NA NA NA ...
##  $ X1106146                      : int  NA NA NA NA NA NA NA NA NA 3 ...
##  $ X1106162                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1106218                      : int  NA NA 3 NA NA NA NA NA NA NA ...
##  $ X1106238                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1106251                      : int  NA 9 NA NA 21 4 NA 8 NA 9 ...
##  $ X1106293                      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ X1106406                      : int  NA NA NA NA NA NA NA NA NA 3 ...
##   [list output truncated]
otu.grp<-otu.comm%>%left_join(map6, by='SampleID')
## Error in left_join_impl(x, y, by$x, by$y, suffix$x, suffix$y, check_na_matches(na_matches)): Can't join on 'SampleID' x 'SampleID' because of incompatible types (factor / logical)
#instead of making our df larger, can reorder mapping data to community like above
grps<-map6[match(colnames(unifrac), map6$SampleID),]
grps$SampleID
##  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [24] NA NA NA NA NA NA NA NA NA NA
# now that all the data (community, distances, grouping variables) are ordered the same, we can use permanova etc

Permanova

Permanova tests for differences in composition among groups
Reminder - permanova is always based on pairwise distances/dissimilarities.

Month and elevation are significant in all tests. Year is not but that is probably because month is nested within year and that is not indicated in the model.

Include Core in model appropriately Or alternatively, if not interested in Core effect, constrain permutations within Core using ‘strata’
Manipulate code for test of interest

NMDS

The NMDS often pairs well with the PERMANOVA because both are more robust to properties of community data. To gain proper fit may need to try multiple dissimilarity methods or tranformations.

Plot ordination

Manipulate plot code to make figure as desired

Separates well by depth/C age, within each core grouped together for the most part. EMI is more different/distinct, and the most overlap in SJI and SJII

NMDS vs PCoA

NMDS is an unconstrained ordination method to visualize multivariate data in fewer dimensions. Depending on the properties of your data and your questions, different methods of ordination may be appropriate. PCoA (or MDS) is a metric version of NMDS, meaning that PCoA is a Euclidean representation of dissimilarities. So, like NMDS PCoA uses dissimilarities, but it is euclidean rather than rank-order which is used to preserve distances hierarchically. NMDS and distance-based ordinations/tests are often easier to use with community data because 0’s and 1’s, rare species are common and do not configure in euclidean space.

In general, ordination & multivaraite methods are defined by (1) whether or not using dissimilarity, (2) whether or not euclidean, (3) test for differences vs explaining differences …

PCoA, like PCA, returns a set of orthogonal axes with associated eignevalues that measure their importance. PCoA and PCA should produce the same results if the PCA is calculated on a covariance matrix with scaling of 1.

plot the pcoa

Q2. If differences exist, are there particular taxa that contribute disproportionately to those differences?

Note: Can run any analyses that rely on relative abundance data (e.g., simper), And/Or address using absence-presence data

Outputs:

1. Indicator species analysis to identify taxon-habitat association patterns.

This procedure identifies OTUs as indicator species independently from their abundance in the total data set. ####Steps (from https://www.nature.com/articles/ismej2015238):

  1. single- and doubleton OTUs are removed as they hold little indicator informationusing the multipatt function (number of permutations=9999) implemented in the indicspecies R package (De Cáceres et al., 2010).
  2. To account for multiple testing, P-values were corrected by calculating false discovery rates (q-values) with the q-value function implemented in the BiocLite R package (Dabney and Storney, 2014). Indicator OTUs with q<0.05 were considered significant.
  3. Indicator taxa were represented in bipartite networks by using the edge-weighted spring embedded algorithm layout implemented in Cytoscape v.3.0.2 (Shannon et al., 2003) where point biserial correlation values, a measure of species–habitat association, were used to weight the edges between nodes constituting the habitats and indicator OTUs (Hartmann et al., 2015).
  4. We further mapped these indicator OTUs on taxonomic networks generated in Cytoscape v.3.0.2 to investigate potential taxa–habitat associations (Hartmann et al., 2015). I
  5. Indicator OTUs classified at the genus level were displayed in a taxonomic tree generated in iTOL (Letunic and Bork, 2011) together with the positive point biserial correlation values associated with each habitat.
  6. To find patterns of co-occurrence among OTUs characteristic of the habitats studied, we analysed correlations among the relative abundances of all indicator OTUs (co-correlations) by calculating Spearman rank correlation values (Spearman, 1904) in R with the function corr.test implemented in the package psych. Multiple testing was accounted for by correcting P-values with a false discovery rates of 5% (q-value<0.05).
  7. Bacterial indicator OTUs that were significantly co-correlated were displayed in networks with the edge-weighted spring embedded algorithm layout implemented in Cytoscape where Spearman correlation values were used to weight the edges between the nodes representing the OTUs (Hartmann et al., 2015).”

Q3. Can the observed variation in XXX be explained by depth? By nutrient, By Carbon age?

Outputs:

1. Multiple regression on distance matrix