Exercise 2 Multivariate Statistics: NMDS and PERMANOVA
1 Importing and checking multivariate dataset
- Anderson et al. (2011)
- Ramette (2007)
- Oksanen (2013)
- Oksanen et al. (2007)
phyloseq-class experiment-level object
otu_table() OTU Table: [ 19216 taxa and 26 samples ]
sample_data() Sample Data: [ 26 samples by 7 sample variables ]
tax_table() Taxonomy Table: [ 19216 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 19216 tips and 19215 internal nodes ]
Taxonomy Table: [5 taxa by 7 taxonomic ranks]:
Kingdom Phylum Class Order Family
549322 "Archaea" "Crenarchaeota" "Thermoprotei" NA NA
522457 "Archaea" "Crenarchaeota" "Thermoprotei" NA NA
951 "Archaea" "Crenarchaeota" "Thermoprotei" "Sulfolobales" "Sulfolobaceae"
244423 "Archaea" "Crenarchaeota" "Sd-NA" NA NA
586076 "Archaea" "Crenarchaeota" "Sd-NA" NA NA
Genus Species
549322 NA NA
522457 NA NA
951 "Sulfolobus" "Sulfolobusacidocaldarius"
244423 NA NA
586076 NA NA
phyloseq-class experiment-level object
otu_table() OTU Table: [ 1000 taxa and 26 samples ]
sample_data() Sample Data: [ 26 samples by 7 sample variables ]
tax_table() Taxonomy Table: [ 1000 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 1000 tips and 999 internal nodes ]
Square root transformation
Wisconsin double standardization
Run 0 stress 0.1522654
Run 1 stress 0.1775497
Run 2 stress 0.1775501
Run 3 stress 0.1775497
Run 4 stress 0.1522655
... Procrustes: rmse 0.0006112153 max resid 0.002067813
... Similar to previous best
Run 5 stress 0.1522653
... New best solution
... Procrustes: rmse 0.0004049911 max resid 0.001364409
... Similar to previous best
Run 6 stress 0.1942675
Run 7 stress 0.1926255
Run 8 stress 0.1522663
... Procrustes: rmse 0.000893743 max resid 0.00299815
... Similar to previous best
Run 9 stress 0.1775497
Run 10 stress 0.1774573
Run 11 stress 0.1522653
... New best solution
... Procrustes: rmse 0.0001509282 max resid 0.0005088348
... Similar to previous best
Run 12 stress 0.1522655
... Procrustes: rmse 0.0003089112 max resid 0.001018589
... Similar to previous best
Run 13 stress 0.1522653
... Procrustes: rmse 0.0001585899 max resid 0.0005232999
... Similar to previous best
Run 14 stress 0.1522653
... Procrustes: rmse 9.380553e-05 max resid 0.0003118606
... Similar to previous best
Run 15 stress 0.1522653
... Procrustes: rmse 2.612925e-05 max resid 7.942554e-05
... Similar to previous best
Run 16 stress 0.1522654
... Procrustes: rmse 0.0001868435 max resid 0.0006181396
... Similar to previous best
Run 17 stress 0.1522654
... Procrustes: rmse 0.0002325483 max resid 0.0007890376
... Similar to previous best
Run 18 stress 0.1522663
... Procrustes: rmse 0.0007201793 max resid 0.002413511
... Similar to previous best
Run 19 stress 0.1522653
... Procrustes: rmse 7.193655e-05 max resid 0.000223239
... Similar to previous best
Run 20 stress 0.1940658
*** Solution reached
Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(veganifyOTU(physeq)))
Distance: bray
Dimensions: 2
Stress: 0.1522653
Stress type 1, weak ties
Two convergent solutions found after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))'
p1 <- plot_ordination(GP3, nmds.bray, color = "SampleType", label = "X.SampleID")
p1 + geom_point(size = 2) + theme_bw() + scale_colour_brewer(type = "qual", palette = "Set1")veganotu = function(physeq) {
require("vegan")
OTU = otu_table(physeq)
if (taxa_are_rows(OTU)) {
OTU = t(OTU)
}
return(as(OTU, "matrix"))
}# export data from phyloseq to vegan-compatible object
GP3_vegan <- veganotu(GP3)
# make a data frame that can be used in vegan from the sample_data in the
# phyloseq object
sampledf <- data.frame(sample_data(GP3))
GP3_BC <- vegdist(wisconsin(sqrt(GP3_vegan)), method = "bray")
betadisp_GP3 <- betadisper(GP3_BC, sampledf$SampleType)
betadisp_GP3
Homogeneity of multivariate dispersions
Call: betadisper(d = GP3_BC, group = sampledf$SampleType)
No. of Positive Eigenvalues: 25
No. of Negative Eigenvalues: 0
Average distance to median:
Feces Freshwater Freshwater (creek) Mock
0.33178 0.29436 0.12327 0.08393
Ocean Sediment (estuary) Skin Soil
0.26050 0.21265 0.26153 0.28633
Tongue
0.27894
Eigenvalues for PCoA axes:
(Showing 8 of 25 eigenvalues)
PCoA1 PCoA2 PCoA3 PCoA4 PCoA5 PCoA6 PCoA7 PCoA8
1.9067 1.3699 1.1836 0.9987 0.8632 0.7644 0.5841 0.3453
Analysis of Variance Table
Response: Distances
Df Sum Sq Mean Sq F value Pr(>F)
Groups 8 0.16767 0.0209582 2.9319 0.02965 *
Residuals 17 0.12152 0.0071483
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Permutation test for homogeneity of multivariate dispersions
Permutation: free
Number of permutations: 999
Response: Distances
Df Sum Sq Mean Sq F N.Perm Pr(>F)
Groups 8 0.16767 0.0209582 2.9319 999 0.03 *
Residuals 17 0.12152 0.0071483
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = distances ~ group, data = df)
$group
diff lwr upr
Freshwater-Feces -0.037427339 -0.29582040 0.22096572
Freshwater (creek)-Feces -0.208515110 -0.43639637 0.01936615
Mock-Feces -0.247849374 -0.47573063 -0.01996812
Ocean-Feces -0.071286840 -0.29916810 0.15659442
Sediment (estuary)-Feces -0.119131402 -0.34701266 0.10874986
Skin-Feces -0.070253329 -0.29813459 0.15762793
Soil-Feces -0.045449007 -0.27333027 0.18243225
Tongue-Feces -0.052845174 -0.31123823 0.20554789
Freshwater (creek)-Freshwater -0.171087771 -0.44345797 0.10128243
Mock-Freshwater -0.210422036 -0.48279224 0.06194817
Ocean-Freshwater -0.033859502 -0.30622970 0.23851070
Sediment (estuary)-Freshwater -0.081704063 -0.35407426 0.19066614
Skin-Freshwater -0.032825990 -0.30519619 0.23954421
Soil-Freshwater -0.008021668 -0.28039187 0.26434853
Tongue-Freshwater -0.015417835 -0.31378444 0.28294877
Mock-Freshwater (creek) -0.039334264 -0.28294958 0.20428105
Ocean-Freshwater (creek) 0.137228270 -0.10638704 0.38084358
Sediment (estuary)-Freshwater (creek) 0.089383708 -0.15423161 0.33299902
Skin-Freshwater (creek) 0.138261781 -0.10535353 0.38187709
Soil-Freshwater (creek) 0.163066103 -0.08054921 0.40668142
Tongue-Freshwater (creek) 0.155669936 -0.11670026 0.42804014
Ocean-Mock 0.176562534 -0.06705278 0.42017785
Sediment (estuary)-Mock 0.128717972 -0.11489734 0.37233329
Skin-Mock 0.177596045 -0.06601927 0.42121136
Soil-Mock 0.202400367 -0.04121495 0.44601568
Tongue-Mock 0.195004200 -0.07736600 0.46737440
Sediment (estuary)-Ocean -0.047844562 -0.29145988 0.19577075
Skin-Ocean 0.001033511 -0.24258180 0.24464882
Soil-Ocean 0.025837833 -0.21777748 0.26945315
Tongue-Ocean 0.018441666 -0.25392853 0.29081187
p adj
Freshwater-Feces 0.9998024
Freshwater (creek)-Feces 0.0876631
Mock-Feces 0.0274547
Ocean-Feces 0.9656755
Sediment (estuary)-Feces 0.6554145
Skin-Feces 0.9684065
Soil-Feces 0.9980333
Tongue-Feces 0.9976615
Freshwater (creek)-Freshwater 0.4375204
Mock-Freshwater 0.2092521
Ocean-Freshwater 0.9999371
Sediment (estuary)-Freshwater 0.9730060
Skin-Freshwater 0.9999502
Soil-Freshwater 1.0000000
Tongue-Freshwater 0.9999999
Mock-Freshwater (creek) 0.9995616
Ocean-Freshwater (creek) 0.5699091
Sediment (estuary)-Freshwater (creek) 0.9197110
Skin-Freshwater (creek) 0.5609711
Soil-Freshwater (creek) 0.3612753
Tongue-Freshwater (creek) 0.5525699
Ocean-Mock 0.2723881
Sediment (estuary)-Mock 0.6436923
Skin-Mock 0.2662747
Soil-Mock 0.1485728
Tongue-Mock 0.2853980
Sediment (estuary)-Ocean 0.9982325
Skin-Ocean 1.0000000
Soil-Ocean 0.9999812
Tongue-Ocean 0.9999994
[ reached getOption("max.print") -- omitted 6 rows ]
Call:
adonis(formula = GP3_BC ~ SampleType, data = sampledf)
Permutation: free
Number of permutations: 999
Terms added sequentially (first to last)
Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
SampleType 8 7.8090 0.97612 9.8278 0.82222 0.001 ***
Residuals 17 1.6885 0.09932 0.17778
Total 25 9.4975 1.00000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
2 Using vegan to do the NMDS plot
Call:
metaMDS(comm = GP3_vegan, trace = FALSE)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(GP3_vegan))
Distance: bray
Dimensions: 2
Stress: 0.1522653
Stress type 1, weak ties
Two convergent solutions found after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on 'wisconsin(sqrt(GP3_vegan))'
library(RColorBrewer)
# Build a color palette, 9 (number of sample types) colours coming from the Set1
# color palette (which was used in the original NMDS above)
colpal <- brewer.pal(9, "Set1")plot(GP3_mds, type = "n") #ordiplot can also be used.
with(sampledf, points(GP3_mds, display = "sites", col = colpal[SampleType], pch = 16,
cex = 1.2))
text(GP3_mds, cex = 0.5, col = "black", pos = 1)
with(sampledf, legend("topright", legend = levels(SampleType), bty = "n", col = colpal,
pch = 16, cex = 0.7))
text(2.5, -1.35, "Stress = 0.152", cex = 0.7)3 With mock community samples removed
GP4 <- subset_samples(GP3, SampleType != "Mock")
GP4_vegan <- veganotu(GP4)
sampledf2 <- data.frame(sample_data(GP4))
GP4_BC <- vegdist(wisconsin(sqrt(GP4_vegan)), method = "bray")
nmds.bray2 <- ordinate(GP4, method = "NMDS", distance = "bray")Square root transformation
Wisconsin double standardization
Run 0 stress 0.1596779
Run 1 stress 0.1599026
... Procrustes: rmse 0.01404802 max resid 0.05637365
Run 2 stress 0.1599026
... Procrustes: rmse 0.0140143 max resid 0.05624806
Run 3 stress 0.19821
Run 4 stress 0.159678
... Procrustes: rmse 8.259375e-05 max resid 0.0002345431
... Similar to previous best
Run 5 stress 0.1604022
Run 6 stress 0.1599026
... Procrustes: rmse 0.01403124 max resid 0.0563113
Run 7 stress 0.1596779
... New best solution
... Procrustes: rmse 0.0001521897 max resid 0.0004457244
... Similar to previous best
Run 8 stress 0.1596779
... Procrustes: rmse 0.0001607379 max resid 0.0004795825
... Similar to previous best
Run 9 stress 0.1599029
... Procrustes: rmse 0.01394395 max resid 0.0560914
Run 10 stress 0.1596779
... New best solution
... Procrustes: rmse 3.086897e-05 max resid 7.3232e-05
... Similar to previous best
Run 11 stress 0.1604025
Run 12 stress 0.1596779
... Procrustes: rmse 0.0001612356 max resid 0.0004806272
... Similar to previous best
Run 13 stress 0.1599025
... Procrustes: rmse 0.0140568 max resid 0.0565045
Run 14 stress 0.1596779
... Procrustes: rmse 0.0001270942 max resid 0.0003726501
... Similar to previous best
Run 15 stress 0.1982108
Run 16 stress 0.2900923
Run 17 stress 0.1596779
... Procrustes: rmse 2.303684e-05 max resid 6.233002e-05
... Similar to previous best
Run 18 stress 0.1984836
Run 19 stress 0.1599026
... Procrustes: rmse 0.01410313 max resid 0.05665913
Run 20 stress 0.2775131
*** Solution reached
p2 <- plot_ordination(GP4, nmds.bray2, color = "SampleType", label = "X.SampleID")
p2 + geom_point(size = 2) + theme_bw() + scale_colour_brewer(type = "qual", palette = "Set1")
Homogeneity of multivariate dispersions
Call: betadisper(d = GP4_BC, group = sampledf2$SampleType)
No. of Positive Eigenvalues: 22
No. of Negative Eigenvalues: 0
Average distance to median:
Feces Freshwater Freshwater (creek) Ocean
0.3270 0.2959 0.1362 0.2626
Sediment (estuary) Skin Soil Tongue
0.2289 0.2656 0.2840 0.2825
Eigenvalues for PCoA axes:
(Showing 8 of 22 eigenvalues)
PCoA1 PCoA2 PCoA3 PCoA4 PCoA5 PCoA6 PCoA7 PCoA8
1.5737 1.3035 1.1163 0.9559 0.7418 0.5758 0.3748 0.2511
Analysis of Variance Table
Response: Distances
Df Sum Sq Mean Sq F value Pr(>F)
Groups 7 0.072212 0.0103160 1.3221 0.306
Residuals 15 0.117038 0.0078026
Permutation test for homogeneity of multivariate dispersions
Permutation: free
Number of permutations: 999
Response: Distances
Df Sum Sq Mean Sq F N.Perm Pr(>F)
Groups 7 0.072212 0.0103160 1.3221 999 0.294
Residuals 15 0.117038 0.0078026
Call:
adonis(formula = GP4_BC ~ SampleType, data = sampledf2)
Permutation: free
Number of permutations: 999
Terms added sequentially (first to last)
Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
SampleType 7 6.4387 0.91982 8.1365 0.79154 0.001 ***
Residuals 15 1.6957 0.11305 0.20846
Total 22 8.1344 1.00000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Reference
Anderson, Marti J., Thomas O. Crist, Jonathan M. Chase, Mark Vellend, Brian D. Inouye, Amy L. Freestone, Nathan J. Sanders, et al. 2011. “Navigating the Multiple Meanings of β Diversity: A Roadmap for the Practicing Ecologist.” Ecology Letters 14 (1): 19–28. https://doi.org/10.1111/j.1461-0248.2010.01552.x.
Oksanen, Jari. 2013. “Multivariate Analysis of Ecological Communities in R: Vegan Tutorial.” R Package Version, January, 1–43.
Oksanen, Jari, Roeland Kindt, Pierre Legendre, Bob Hara, M. Henry, and Hank Stevens. 2007. “The Vegan Package,” November.
Ramette, Alban. 2007. “Multivariate Analyses in Microbial Ecology.” FEMS Microbiology Ecology 62 (17892477): 142–60. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2121141/.