Logic

Background: Mangrove forests provide important ecosystem services including fisheries nursery habitat, coastline protection, runoff purification, and other local benefits. Of global import is their ability to store carbon in organic peat belowground for thousands of years, with more carbon per unit area than terrestrial tropical forests. Mangrove carbon storage relies in part on their high primary productivity, but essential to the large and long-lived nature of this storage is the slow rate of microbial decomposition of buried mangrove peat. In this study, we ask how carbon and nitrogen contents and microbial community composition vary with peat age and describe the formation of these mangrove peat deposits over time. At four peat-containing mangrove Cores in the area of La Paz, B.C.S., Mexico, we cored the sediments until rejection with a Russian peat corer and from these cores obtained 5 cm samples at 20 cm intervals. In these samples we measured organic carbon, nitrogen, and 14C age. We observed high carbon concentrations (3.4×10-2±1.0×10-2 g/cm3) and Corg:N ratios (42 ± 15) in peat samples and inter-Core difference in Corg:N that reflect differing preservation conditions. Recalcitrant organic matter sources and anaerobic conditions create a strong imprint on peat microbial communities. Radiocarbon dating allowed us to reconstruct the accumulation of organic matter over the last 5,000 years. Even over these long time scales, though microbes evidently have continuously cycled the peat nitrogen pool, carbon stocks remain effectively untouched over time. Future work should examine the roles of distinct microbial taxa and metabolisms in the slow turnover of mangrove carbon. Mangrove forests, long considered detritus-based ecosystems, can only be understood when belowground carbon cycling processes are captured at sufficient resolution, capturing the variation across these dynamic estuarine landscapes. Furthermore, data on patterns of belowground carbon in these threatened systems can motivate their conservation, given the value of their ecosystem service of carbon storage, estimated to be worth on the order of 1 billion US$ in the Gulf of California’s mangroves alone.

Here, we study variation in sediment carbon, nitrogen, and microbial community composition with depth, and thus age of accumulated sediment.

We hypothesized that sediment from deeper, older peat samples will contain lower concentrations of organic carbon and enrichment in δ13C and δ15N due to microbial degradation and denitrification over time. We also hypothesized that peat age will be correlated with a shift in the microbial community toward taxa that can consume this refractory remaining carbon and away from the community composition near the sediment surface with greater affinity to marine and terrestrial soil communities.

Q1. Mangrove sediments store blue carbon, but what is the role of the microbial community in nutrient cycling in these substrates at depth?

Outputs:

1. Determine the number of new OTUs identified at each depth interval per core and by substrate.

2. NMDS of unweighted unifrac or jaccard dissimilarity. PERMANOVA with PERMDISP.

Prepare data

Split taxonomy column into multiple variables

# bacteria otu
B16S <- read.csv("filtered_table_w_metadataMngrv.csv", header = TRUE, stringsAsFactors = FALSE)

# bacteria otu B16S<-read.table('filtered_table_w_metadataMangrove.txt',
# header = TRUE, stringsAsFactors = FALSE)

# split taxa groupings into new columns
source("Mangrove_functions.R")
B16S <- split_taxa(B16S)

# when reading in a .csv file, B16S has the first column as OTUID and then
# all subsequent columns (ID# = ) X1, X2 etc. instead of 1, 2 ....

# is there something wrong with my function? I don't have anything for
# species level... even though on the OTU table I have species for some for
# sure, e.g., OTU ID# 334906 should be: iriomotensis , as in: k__Bacteria;
# p__Actinobacteria; c__Actinobacteria; o__Actinomycetales;
# f__Thermomonosporaceae; g__Actinoallomurus; s__iriomotensis

Read in UniFrac distances

UniFrac is a measure of B-diversity that uses phylogenetic information to compare community samples. Use with multivariate methods like principal coordinates (PCoA), NMDS, etc to explain differences among communities. Measured as the amount of evolutionary history that is unique to each group - the fraction of branch length in a phylogenetic tree associated with ONLY the focal sample in pairwise comparison. With sequence data UniFrac values can be invfluencd by the number of sequences/sample and can use sequence jackknifing (Lozupone et al 2011. ISME).
Use to compare phylogenetic lineages between groups, to cluster samples.

unifrac <- read.table("unweighted_unifrac_dm.txt")
unifrac_wt <- read.table("weighted_unifrac_dm.txt")

Use heatmap to visualize sample dissimilarity based on UniFrac

Prep data for multivariate anaylsis

otu.taxa <- read.csv("OTU_taxa_id.csv")  # otu ids
otu.comm <- read.csv("OTU_community_clean.csv")

# otu.grp<-otu.comm%>%left_join(map6, by='SampleID')
## the samples in the unifrac dist and community need to be in the same order
## need to order community based on unifrac
colnames(unifrac)  # use this order
##  [1] "X2"  "X1"  "X7"  "X9"  "X12" "X4"  "X5"  "X16" "X20" "X13" "X10"
## [12] "X23" "X17" "X18" "X11" "X21" "X6"  "X29" "X19" "X27" "X15" "X8" 
## [23] "X25" "X33" "X24" "X28" "X26" "X30" "X31" "X14"
# make new df based on unifraq
otu.ordered <- data.frame(SampleID = colnames(unifrac)) %>% left_join(otu.comm, 
    by = "SampleID")
otu.ordered$SampleID
##  [1] "X2"  "X1"  "X7"  "X9"  "X12" "X4"  "X5"  "X16" "X20" "X13" "X10"
## [12] "X23" "X17" "X18" "X11" "X21" "X6"  "X29" "X19" "X27" "X15" "X8" 
## [23] "X25" "X33" "X24" "X28" "X26" "X30" "X31" "X14"
# this works but is inefficient - there are multiple copies of the otu table
# which is large

# alternatively - match the order, rewrite over same name
otu.comm <- otu.comm[match(colnames(unifrac), otu.comm$SampleID), ]
# this says - for the community df, order the rows to match the order of the
# column names in unifrac, that match to SampleID
rm(otu.ordered)
# combine with grouping variables - align mapping data with community data
otu.grp <- otu.comm %>% left_join(map6, by = "SampleID")
## Error in left_join_impl(x, y, by_x, by_y, aux_x, aux_y, na_matches): Can't join on 'SampleID' x 'SampleID' because of incompatible types (integer / factor)
View(map6)
# instead of making our df larger, can reorder mapping data to community
# like above
grps <- map6[match(colnames(unifrac), map6$SampleID), ]
grps$SampleID
##  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [24] NA NA NA NA NA NA NA
# now that all the data (community, distances, grouping variables) are
# ordered the same, we can use permanova etc

Permanova

Permanova tests for differences in composition among groups
Reminder - permanova is always based on pairwise distances/dissimilarities.

set.seed(304)

# unifrac distances
ad.uni <- adonis2(unifrac ~ BulkDensity + N15 + CtoN, data = grps, permutations = 1000, 
    strata = "N15")
## Error in qr.fitted(qrhs, G): 'qr' and 'y' must have the same number of rows
ad.uni
## Error in eval(expr, envir, enclos): object 'ad.uni' not found
# unifrac distances weighted by relative abundance
ad.uniwt <- adonis2(unifrac_wt ~ BulkDensity + N15 + CtoN, data = grps, permutations = 1000)
## Error in qr.fitted(qrhs, G): 'qr' and 'y' must have the same number of rows
ad.uniwt
## Error in eval(expr, envir, enclos): object 'ad.uniwt' not found
## are the results the same with other (non evolutionary) dissimiarlity
## indices?
dist.j <- vegdist(otu.comm[, -1], method = "jaccard", na.rm = TRUE)
dist.bc <- vegdist(otu.comm[, -1], method = "bray", na.rm = TRUE)

ad.bc <- adonis2(dist.bc ~ BulkDensity + N15 + CtoN, data = grps, permutations = 1000, 
    strata = "N15")
## Error in qr.fitted(qrhs, G): 'qr' and 'y' must have the same number of rows
ad.bc
## Error in eval(expr, envir, enclos): object 'ad.bc' not found
ad.j <- adonis2(dist.j ~ BulkDensity + N15 + CtoN, data = grps, permutations = 1000)
## Error in qr.fitted(qrhs, G): 'qr' and 'y' must have the same number of rows
ad.j
## Error in eval(expr, envir, enclos): object 'ad.j' not found

BulkDensity and CtoN are significant in all tests. N15 is not but that is probably because BulkDensity is nested within N15 and that is not indicated in the model.

Include N15 in model appropriately Or alternatively, if not interested in N15 effect, constrain permutations within N15 using ‘strata’
Manipulate code for test of interest

NMDS

The NMDS often pairs well with the PERMANOVA because both are more robust to properties of community data. To gain proper fit may need to try multiple dissimilarity methods or tranformations.

# metaMDS can take communty data or dissimilarities
mds.uni <- metaMDS(unifrac, k = 2, autotransform = FALSE)
## Run 0 stress 0.08536618 
## Run 1 stress 0.08536618 
## ... Procrustes: rmse 1.651416e-06  max resid 3.251338e-06 
## ... Similar to previous best
## Run 2 stress 0.2297191 
## Run 3 stress 0.08536619 
## ... Procrustes: rmse 6.177025e-06  max resid 2.502174e-05 
## ... Similar to previous best
## Run 4 stress 0.08536618 
## ... Procrustes: rmse 1.598217e-06  max resid 3.22727e-06 
## ... Similar to previous best
## Run 5 stress 0.2006335 
## Run 6 stress 0.1248435 
## Run 7 stress 0.08536619 
## ... Procrustes: rmse 3.475035e-06  max resid 1.2838e-05 
## ... Similar to previous best
## Run 8 stress 0.08536618 
## ... New best solution
## ... Procrustes: rmse 3.805398e-06  max resid 1.534849e-05 
## ... Similar to previous best
## Run 9 stress 0.08536618 
## ... Procrustes: rmse 1.372962e-06  max resid 3.592504e-06 
## ... Similar to previous best
## Run 10 stress 0.2006335 
## Run 11 stress 0.08536618 
## ... Procrustes: rmse 3.399884e-06  max resid 1.065011e-05 
## ... Similar to previous best
## Run 12 stress 0.08536619 
## ... Procrustes: rmse 1.873373e-05  max resid 7.77115e-05 
## ... Similar to previous best
## Run 13 stress 0.08536618 
## ... Procrustes: rmse 2.350118e-06  max resid 1.122233e-05 
## ... Similar to previous best
## Run 14 stress 0.08536618 
## ... Procrustes: rmse 2.752654e-06  max resid 1.039983e-05 
## ... Similar to previous best
## Run 15 stress 0.08536618 
## ... Procrustes: rmse 1.923558e-06  max resid 5.628593e-06 
## ... Similar to previous best
## Run 16 stress 0.08536618 
## ... Procrustes: rmse 1.087482e-06  max resid 3.134626e-06 
## ... Similar to previous best
## Run 17 stress 0.08536618 
## ... Procrustes: rmse 1.9459e-06  max resid 7.323164e-06 
## ... Similar to previous best
## Run 18 stress 0.1248502 
## Run 19 stress 0.08536618 
## ... Procrustes: rmse 2.980382e-06  max resid 1.21318e-05 
## ... Similar to previous best
## Run 20 stress 0.1926134 
## *** Solution reached
mds.uniwt <- metaMDS(unifrac_wt, k = 2, autotransform = FALSE)
## Run 0 stress 0.05921919 
## Run 1 stress 0.06676251 
## Run 2 stress 0.08280309 
## Run 3 stress 0.07098615 
## Run 4 stress 0.05921908 
## ... New best solution
## ... Procrustes: rmse 0.0001338584  max resid 0.0004995707 
## ... Similar to previous best
## Run 5 stress 0.05921907 
## ... New best solution
## ... Procrustes: rmse 6.827362e-05  max resid 0.0002826464 
## ... Similar to previous best
## Run 6 stress 0.05921934 
## ... Procrustes: rmse 0.0002129642  max resid 0.0005879745 
## ... Similar to previous best
## Run 7 stress 0.06676257 
## Run 8 stress 0.06343975 
## Run 9 stress 0.07635018 
## Run 10 stress 0.06676291 
## Run 11 stress 0.06676314 
## Run 12 stress 0.06676304 
## Run 13 stress 0.05921938 
## ... Procrustes: rmse 9.160247e-05  max resid 0.0003376133 
## ... Similar to previous best
## Run 14 stress 0.0634396 
## Run 15 stress 0.06343912 
## Run 16 stress 0.05921914 
## ... Procrustes: rmse 4.801026e-05  max resid 0.000182763 
## ... Similar to previous best
## Run 17 stress 0.07634995 
## Run 18 stress 0.06676301 
## Run 19 stress 0.07635074 
## Run 20 stress 0.08455778 
## *** Solution reached
stressplot(mds.uni)

stressplot(mds.uniwt)

## both have good stress(below .15)

Plot ordination

# get NMDS scores for each SampleID
scores(mds.uni)  # scores are returned for each sampleid as a row, collapses dissimilarity data into 2 dimensions
##           NMDS1       NMDS2
## X2  -0.28775025 -0.13636331
## X1  -0.21052517 -0.25651506
## X7  -0.29340263 -0.02860367
## X9  -0.18863356 -0.06084573
## X12  0.19632677 -0.21498448
## X4  -0.19484463 -0.21251022
## X5  -0.12323409 -0.26201907
## X16 -0.37665473  0.25096181
## X20 -0.08661779  0.39608348
## X13 -0.02034087 -0.12582273
## X10 -0.11855444 -0.08975116
## X23  0.10625527  0.13946189
## X17 -0.27237584  0.11689183
## X18  0.01214147 -0.28325367
## X11 -0.03608987 -0.10270502
## X21 -0.04552938  0.38267449
## X6  -0.26749032 -0.04202157
## X29  0.18759063  0.10459794
## X19 -0.10871454  0.41728590
## X27  0.25107609  0.07298581
## X15 -0.06893066 -0.09858800
## X8  -0.13806657 -0.07660505
## X25  0.08795354  0.14833137
## X33  0.08912667  0.11908314
## X24  0.16347244  0.08268772
## X28  0.34801937 -0.03518614
## X26  0.32886958  0.08029336
## X30  0.45369862 -0.05990652
## X31  0.47507197 -0.02661732
## X14  0.13815294 -0.19903999
nmds.uni <- data.frame(scores(mds.uni))
rownames(nmds.uni)  # match the rows with the grouping data for plotting
##  [1] "X2"  "X1"  "X7"  "X9"  "X12" "X4"  "X5"  "X16" "X20" "X13" "X10"
## [12] "X23" "X17" "X18" "X11" "X21" "X6"  "X29" "X19" "X27" "X15" "X8" 
## [23] "X25" "X33" "X24" "X28" "X26" "X30" "X31" "X14"
nmds.uni <- nmds.uni[match(grps$SampleID, rownames(nmds.uni)), ] %>% cbind(grps)  # then add that df, left_join would have similar effect
head(nmds.uni)
##      NMDS1 NMDS2 SampleID BarcodeSequence LinkerPrimerSequence PlotName
## NA      NA    NA       NA              NA                   NA     <NA>
## NA.1    NA    NA       NA              NA                   NA     <NA>
## NA.2    NA    NA       NA              NA                   NA     <NA>
## NA.3    NA    NA       NA              NA                   NA     <NA>
## NA.4    NA    NA       NA              NA                   NA     <NA>
## NA.5    NA    NA       NA              NA                   NA     <NA>
##      Core DepthBottom MCDescriptor Carbon C13org CODensity CAPmp Ndensity
## NA   <NA>          NA         <NA>     NA     NA        NA    NA       NA
## NA.1 <NA>          NA         <NA>     NA     NA        NA    NA       NA
## NA.2 <NA>          NA         <NA>     NA     NA        NA    NA       NA
## NA.3 <NA>          NA         <NA>     NA     NA        NA    NA       NA
## NA.4 <NA>          NA         <NA>     NA     NA        NA    NA       NA
## NA.5 <NA>          NA         <NA>     NA     NA        NA    NA       NA
##      N15 CtoN BulkDensity Description
## NA    NA   NA          NA        <NA>
## NA.1  NA   NA          NA        <NA>
## NA.2  NA   NA          NA        <NA>
## NA.3  NA   NA          NA        <NA>
## NA.4  NA   NA          NA        <NA>
## NA.5  NA   NA          NA        <NA>
# weighted unifrac
nmds.uniwt <- data.frame(scores(mds.uniwt))
nmds.uniwt <- nmds.uniwt[match(grps$SampleID, rownames(nmds.uniwt)), ] %>% cbind(grps)
ggplot(nmds.uni, aes(NMDS1, NMDS2)) + geom_point(aes(color = CtoN, shape = BulkDensity))
## Error: A continuous variable can not be mapped to shape

Manipulate plot code to make figure as desired

ggplot(nmds.uni, aes(NMDS1, NMDS2)) + geom_point(aes(color = CtoN, shape = Core), 
    size = 3) + scale_color_gradient2(low = "turquoise", mid = "yellow", high = "red", 
    midpoint = 1500)

ggplot(nmds.uniwt, aes(NMDS1, NMDS2)) + geom_point(aes(color = CtoN, shape = Core), 
    size = 3) + scale_color_gradient2(low = "turquoise", mid = "yellow", high = "red", 
    midpoint = 1500)

Dpes it separate well by CtoN/core, within each CtoN the BulkDensitys are grouped together for the most part.

NMDS vs PCoA Definitely do NMDS – horshoe pattern in PCoA and that is not statistically robust for this analyses

NMDS is an unconstrained ordination method to visualize multivariate data in fewer dimensions. Depending on the properties of your data and your questions, different methods of ordination may be appropriate. PCoA (or MDS) is a metric version of NMDS, meaning that PCoA is a Euclidean representation of dissimilarities. So, like NMDS PCoA uses dissimilarities, but it is euclidean rather than rank-order which is used to preserve distances hierarchically. NMDS and distance-based ordinations/tests are often easier to use with community data because 0’s and 1’s, rare species are common and do not configure in euclidean space.

In general, ordination & multivaraite methods are defined by (1) whether or not using dissimilarity, (2) whether or not euclidean, (3) test for differences vs explaining differences …

PCoA, like PCA, returns a set of orthogonal axes with associated eignevalues that measure their importance. PCoA and PCA should produce the same results if the PCA is calculated on a covariance matrix with scaling of 1.

# PCoA - unconstrained, euclidean
pcoa.uni <- capscale(unifrac ~ 1, data = grps)
head(summary(pcoa.uni))
## 
## Call:
## capscale(formula = unifrac ~ 1, data = grps) 
## 
## Partitioning of squared Unknown distance:
##               Inertia Proportion
## Total           7.035          1
## Unconstrained   7.035          1
## 
## Eigenvalues, and their contribution to the squared Unknown distance 
## 
## Importance of components:
##                         MDS1   MDS2    MDS3    MDS4    MDS5    MDS6
## Eigenvalue            1.3010 1.0000 0.41559 0.33531 0.31043 0.27816
## Proportion Explained  0.1849 0.1422 0.05908 0.04767 0.04413 0.03954
## Cumulative Proportion 0.1849 0.3271 0.38618 0.43385 0.47798 0.51752
##                          MDS7    MDS8    MDS9   MDS10   MDS11   MDS12
## Eigenvalue            0.20583 0.18189 0.17040 0.17003 0.16510 0.16117
## Proportion Explained  0.02926 0.02586 0.02422 0.02417 0.02347 0.02291
## Cumulative Proportion 0.54678 0.57263 0.59686 0.62103 0.64450 0.66741
##                         MDS13   MDS14   MDS15   MDS16   MDS17   MDS18
## Eigenvalue            0.15982 0.15941 0.15635 0.14809 0.14746 0.14691
## Proportion Explained  0.02272 0.02266 0.02223 0.02105 0.02096 0.02088
## Cumulative Proportion 0.69013 0.71279 0.73501 0.75607 0.77703 0.79791
##                         MDS19   MDS20   MDS21   MDS22   MDS23   MDS24
## Eigenvalue            0.14169 0.14050 0.13770 0.13472 0.13311 0.12989
## Proportion Explained  0.02014 0.01997 0.01957 0.01915 0.01892 0.01846
## Cumulative Proportion 0.81805 0.83803 0.85760 0.87675 0.89568 0.91414
##                         MDS25   MDS26   MDS27   MDS28   MDS29
## Eigenvalue            0.12761 0.12672 0.12457 0.12397 0.10112
## Proportion Explained  0.01814 0.01801 0.01771 0.01762 0.01437
## Cumulative Proportion 0.93228 0.95029 0.96800 0.98563 1.00000
## 
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores:  3.779281 
## 
## 
## Species scores
## 
##      MDS1 MDS2 MDS3 MDS4 MDS5 MDS6
## Dim1                              
## Dim2                              
## Dim3                              
## Dim4                              
## Dim5                              
## Dim6                              
## ....                              
## 
## 
## Site scores (weighted sums of species scores)
## 
##         MDS1     MDS2    MDS3    MDS4    MDS5     MDS6
## X2   -0.9342 -0.17624  0.8908 -0.4748 -0.6785 -0.61331
## X1   -0.6354 -0.48145  1.5383  0.3548 -0.3117 -0.86911
## X7   -0.9234  0.05066 -0.6762 -1.0436 -0.6434 -0.18282
## X9   -0.7506 -0.27288 -0.8769 -0.4470 -0.5147  0.05184
## X12   0.3467 -0.90817 -0.5261  0.7733  1.4089 -0.14659
## X4   -0.6748 -0.46096  1.3057  0.2616 -0.3739 -0.71151
## ....
pcoa.df <- data.frame(scores(pcoa.uni, display = "Cores"))
## Error in match.arg(display, c("sites", "species", "wa", "lc", "bp", "cn", : 'arg' should be one of "sites", "species", "wa", "lc", "bp", "cn", "reg"
pcoa.df  # Core scores correspond to the SampleID
## Error in eval(expr, envir, enclos): object 'pcoa.df' not found
# bind with grouping data, in order
pcoa.df <- pcoa.df[match(grps$SampleID, rownames(pcoa.df)), ] %>% cbind(grps)
## Error in eval(lhs, parent, parent): object 'pcoa.df' not found

plot the pcoa

## dbrda pcoa.uni<-capscale(unifrac~CtoN+BulkDensity,data=grps)

Q2. If differences exist, are there particular taxa that contribute disproportionately to those differences?

Outputs:

1. Taxa bar charts of N-cycling microbes or microbial analysis to identify limits to decomposition or N cycling in older C-habitat association patterns.

Q3. Can the observed variation in microbial community composition be explained by C:N? By Carbon age?

Outputs:

1. Multiple regression on distance matrix