library(devtools)
library(BiocManager)
#BiocManager::install("ShortRead")
#BiocManager::install("dada2")
library(dada2)
library(dplyr)
library(vegan)
library(ade4)
library(ggplot2)
#https://mothur.s3.us-east-2.amazonaws.com/wiki/miseqsopdata.zip download the sample dataset from the mothur website
#https://zenodo.org/records/1172783/files/silva_nr_v132_train_set.fa.gz?download=1 you will also need to download the reference library to assign species to our ASVs
setwd("~/Desktop/Bioinformatics_Class/Metabarcoding_HW/")
In this assignment, we are analyzing a subset of samples from an experiment that compared gut microbiome communities in two groups of lab mice, one tested directly after weaning (“Early” group) and one several weeks after weaning (“Late” group). The samples were tested using metabarcoding primers focused on the 16S V4 region of rRNA, one of the best-studied areas used for identification of prokaryotes.
Read in the samples from the example data set. This dataset uses 250bp paired-end sequencing data, so there are two files for each of the 20 samples, one for the forward reads (R1_001) and one for the reverse reads (R2_001).
path <- "./MiSeq_SOP/" #set the path to the samples, this will work so long as the MiSeq_SOP folder has been unzipped in your current working directory, otherwise write out the full path
fnFs <- sort(list.files(path, pattern="_R1_001.fastq", full.names = T)) #read the forward read file names
fnRs <- sort(list.files(path, pattern="_R2_001.fastq", full.names = T)) #read the reverse read file names
sample.names <- sapply(strsplit(basename(fnFs), "_"),`[`, 1) #clip the sample names from the first part of each of the forward read file names
sample.names #print the sample names to ensure the above steps worked
## [1] "F3D0" "F3D1" "F3D141" "F3D142" "F3D143" "F3D144" "F3D145" "F3D146"
## [9] "F3D147" "F3D148" "F3D149" "F3D150" "F3D2" "F3D3" "F3D5" "F3D6"
## [17] "F3D7" "F3D8" "F3D9" "Mock"
plotQualityProfile(fnFs) #plot base calling quality plots for the forward reads
plotQualityProfile(fnRs) #plot base calling quality plots for the reverse reads
Below you will create filepaths where filtered versions of the sample data will be written. You will also run the filterandTrim function where you will need to set the following parameters:
Note: You can set different values for the forward and reverse reads for the integer parameters with a two-integer vector (e.g. c(1,2)). Note2: These are the default parameters from the DADA2 tutorial I used. Feel free to experiment around with them to see how they affect (and might be able to improve) the quality of the data.
filtFs <- file.path(path, "filtered", paste0(sample.names, "_F_filt.fastq.gz"))
filtRs <- file.path(path, "filtered", paste0(sample.names, "_R_filt.fastq.gz"))
names(filtFs) <- sample.names
names(filtRs) <- sample.names
out <- filterAndTrim(fwd=fnFs, filt=filtFs, rev=fnRs, filt.rev=filtRs, truncQ = 2, truncLen = c(240,240), trimLeft=0, maxLen=Inf, minLen=0, maxN=0, minQ=0, maxEE = c(2,2), rm.phix = T, rm.lowcomplex = 0, multithread = T, compress = T)
plotQualityProfile(filtFs)
plotQualityProfile(filtRs)
DADA2 implements a machine learning algorithm to attempt to estimate the number of errors due to sequencing, PCR errors, and other sources that have been artificially introduced into the data. This information is necessary for building the ASVs. When creating ASVs, DADA2 needs to be able to assess whether base pair differences between two sequences are real or likely due to some mistake in the sequencing process before deciding whether to split them into separate ASVs or lump them together.
errF <- learnErrors(filtFs, multithread=T)
## 27263040 total bases in 113596 reads from 20 samples will be used for learning the error rates.
errR <- learnErrors(filtRs, multithread=T)
## 27263040 total bases in 113596 reads from 20 samples will be used for learning the error rates.
Here, we can plot the predicted error rate for each combination of base pairs. You should generally see that the predicted error rate falls as quality (Phred) scores get higher. Along the diagonal are bases correctly called and they should be near 1 across all Phred scores (but may dip lower for lower quality bases).
plotErrors(errF, nominalQ=T)
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
## log-10 transformation introduced infinite values.
plotErrors(errR, nominalQ=T)
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
## log-10 transformation introduced infinite values.
Here, supply DADA with the error models built in the last step and it will identify sequences it believes are true variants in each of the samples. You can process each sample independently, or you can pool all the samples together, which will improve your ability to identify rare variants.
dadaFs <- dada(filtFs, errF, multithread = T)
## Sample 1 - 6038 reads in 1654 unique sequences.
## Sample 2 - 4672 reads in 1407 unique sequences.
## Sample 3 - 4477 reads in 1200 unique sequences.
## Sample 4 - 2297 reads in 684 unique sequences.
## Sample 5 - 2361 reads in 753 unique sequences.
## Sample 6 - 3287 reads in 963 unique sequences.
## Sample 7 - 4952 reads in 1272 unique sequences.
## Sample 8 - 3807 reads in 1207 unique sequences.
## Sample 9 - 11439 reads in 2608 unique sequences.
## Sample 10 - 9055 reads in 2127 unique sequences.
## Sample 11 - 9724 reads in 2410 unique sequences.
## Sample 12 - 4036 reads in 1247 unique sequences.
## Sample 13 - 15017 reads in 3028 unique sequences.
## Sample 14 - 5003 reads in 1164 unique sequences.
## Sample 15 - 3289 reads in 982 unique sequences.
## Sample 16 - 6206 reads in 1530 unique sequences.
## Sample 17 - 3917 reads in 966 unique sequences.
## Sample 18 - 4285 reads in 1195 unique sequences.
## Sample 19 - 5651 reads in 1477 unique sequences.
## Sample 20 - 4083 reads in 792 unique sequences.
dadaRs <- dada(filtRs, errR, multithread = T)
## Sample 1 - 6038 reads in 2989 unique sequences.
## Sample 2 - 4672 reads in 2347 unique sequences.
## Sample 3 - 4477 reads in 2395 unique sequences.
## Sample 4 - 2297 reads in 1342 unique sequences.
## Sample 5 - 2361 reads in 1375 unique sequences.
## Sample 6 - 3287 reads in 1983 unique sequences.
## Sample 7 - 4952 reads in 2844 unique sequences.
## Sample 8 - 3807 reads in 2097 unique sequences.
## Sample 9 - 11439 reads in 5855 unique sequences.
## Sample 10 - 9055 reads in 4423 unique sequences.
## Sample 11 - 9724 reads in 4894 unique sequences.
## Sample 12 - 4036 reads in 2274 unique sequences.
## Sample 13 - 15017 reads in 6975 unique sequences.
## Sample 14 - 5003 reads in 2552 unique sequences.
## Sample 15 - 3289 reads in 1868 unique sequences.
## Sample 16 - 6206 reads in 3179 unique sequences.
## Sample 17 - 3917 reads in 2073 unique sequences.
## Sample 18 - 4285 reads in 2142 unique sequences.
## Sample 19 - 5651 reads in 2797 unique sequences.
## Sample 20 - 4083 reads in 1367 unique sequences.
Here we can merge the paired-end reads into one sequence. DADA expects there to be a region that overlaps between both reads and will remove any pairs that don’t have a perfectly overlapping region.
mergers <- mergePairs(dadaF = dadaFs, filtFs, dadaR = dadaRs, filtRs)
head(mergers[[1]])
## sequence
## 1 TACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGCCTGCCAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTACAGCCGTTGAAACTGCCGGGCTCGAGTGGGCGAGAAGTATGCGGAATGCGTGGTGTAGCGGTGAAATGCATAGATATCACGCAGAACCCCGATTGCGAAGGCAGCATACCGGCGCCCTACTGACGCTGAGGCACGAAAGTGCGGGGATCAAACAGG
## 2 TACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGCAGGCGGAAGATCAAGTCAGCGGTAAAATTGAGAGGCTCAACCTCTTCGAGCCGTTGAAACTGGTTTTCTTGAGTGAGCGAGAAGTATGCGGAATGCGTGGTGTAGCGGTGAAATGCATAGATATCACGCAGAACTCCGATTGCGAAGGCAGCATACCGGCGCTCAACTGACGCTCATGCACGAAAGTGTGGGTATCGAACAGG
## 3 TACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGGCTTTTAAGTCAGCGGTAAAAATTCGGGGCTCAACCCCGTCCGGCCGTTGAAACTGGGGGCCTTGAGTGGGCGAGAAGAAGGCGGAATGCGTGGTGTAGCGGTGAAATGCATAGATATCACGCAGAACCCCGATTGCGAAGGCAGCCTTCCGGCGCCCTACTGACGCTGAGGCACGAAAGTGCGGGGATCGAACAGG
## 5 TACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGCAGGCGGACTCTCAAGTCAGCGGTCAAATCGCGGGGCTCAACCCCGTTCCGCCGTTGAAACTGGGAGCCTTGAGTGCGCGAGAAGTAGGCGGAATGCGTGGTGTAGCGGTGAAATGCATAGATATCACGCAGAACTCCGATTGCGAAGGCAGCCTACCGGCGCGCAACTGACGCTCATGCACGAAAGCGTGGGTATCGAACAGG
## 6 TACGTAGGGGGCAAGCGTTATCCGGAATTACTGGGTGTAAAGGGAGCGTAGACGGTAATGCAAGTCTGGAGTGAAAGGCGGGGGCCCAACCCCCGGACTGCTCTGGAAACTGTGTAACTGGAGTGCAGGAGAGGCAGGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCCTGCTGGACTGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAACAGG
## 7 TACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGGATGCCAAGTCAGCGGTAAAAAAGCGGTGCTCAACGCCGTCGAGCCGTTGAAACTGGCGTTCTTGAGTGGGCGAGAAGTATGCGGAATGCGTGGTGTAGCGGTGAAATGCATAGATATCACGCAGAACTCCGATTGCGAAGGCAGCATACCGGCGCCCTACTGACGCTGAGGCACGAAAGCGTGGGTATCGAACAGG
## abundance forward reverse nmatch nmismatch nindel prefer accept
## 1 442 2 3 228 0 0 1 TRUE
## 2 426 1 15 228 0 0 1 TRUE
## 3 419 3 1 228 0 0 1 TRUE
## 5 283 5 7 228 0 0 1 TRUE
## 6 276 9 2 227 0 0 2 TRUE
## 7 262 6 6 228 0 0 1 TRUE
You should have fairly high success when merging reads. If you have lost many reads (>20%), you may want to go back and make sure you are not truncating/trimming the overlapping parts of reads and/or apply more stringent base quality filtering.
merge_successF <- NULL
for(i in 1:length(mergers)){
merge_successF[i] <- sum(mergers[[i]]$abundance)/sum(dadaFs[[i]]@.Data[[2]]$abundance)
}
merge_successR <- NULL
for(i in 1:length(mergers)){
merge_successR[i] <- sum(mergers[[i]]$abundance)/sum(dadaRs[[i]]@.Data[[2]]$abundance)
}
merge_successF
## [1] 0.9080149 0.9383011 0.8620137 0.9212454 0.8660436 0.7674125 0.9261801
## [8] 0.8563729 0.8655589 0.8213525 0.8727863 0.8197060 0.9502855 0.8515195
## [15] 0.9134131 0.8702177 0.9160986 0.9028176 0.9546362 0.9635468
merge_successR
## [1] 0.9086294 0.9486053 0.8639908 0.9345100 0.8660436 0.7769623 0.9238747
## [8] 0.8723579 0.8676405 0.8396194 0.8726949 0.8184394 0.9634902 0.8546571
## [15] 0.9177475 0.8677849 0.9182654 0.9026021 0.9510123 0.9630724
Now we can combine sequence abundance data from across all of the samples into a single table.
seqtab <- makeSequenceTable(mergers)
dim(seqtab) #print the dimensions of the ASV count x sample table
## [1] 20 253
rowSums(seqtab > 0) #calculate the number of ASVs in a sample with at least one sequence found
## F3D0 F3D1 F3D141 F3D142 F3D143 F3D144 F3D145 F3D146 F3D147 F3D148 F3D149
## 99 90 68 45 52 41 67 77 111 101 114
## F3D150 F3D2 F3D3 F3D5 F3D6 F3D7 F3D8 F3D9 Mock
## 74 136 63 71 85 59 92 97 20
table(nchar(getSequences(seqtab))) #print the lengths of the representative sequences used to define each ASV
##
## 251 252 253 254 255
## 1 52 192 6 2
Now that we have our de-noised ASVs, we can look for chimeras - merged paired sequences that were incorrectly paired with the wrong mate, often due to an error in the PCR phase. Most of the time, these chimeras will be relatively rare sequences that have partial alignments to two different more common sequences in the dataset.
seqtab_nochimera <- removeBimeraDenovo(seqtab, method="consensus", multithread=T)
ncol(seqtab_nochimera)/ncol(seqtab) #see the proportion of ASVs remaining after chimera filtering
## [1] 0.9209486
sum(seqtab_nochimera/sum(seqtab)) #see the proportion of the total sequence count after chimera filtering
## [1] 0.9807675
getN <- function(x) sum(getUniques(x))
track <- cbind(out, sapply(dadaFs, getN), sapply(dadaRs, getN), sapply(mergers, getN), rowSums(seqtab_nochimera), (rowSums(seqtab_nochimera)/out[,1])*100)
colnames(track) <- c("input", "filtered", "denoisedF", "denoisedR", "merged", "non-chimeric", "final-kept(%)")
row.names(track) <- sample.names
track
## input filtered denoisedF denoisedR merged non-chimeric final-kept(%)
## F3D0 7793 6038 5914 5910 5370 5370 68.90799
## F3D1 5869 4672 4603 4553 4319 4319 73.59005
## F3D141 5958 4477 4370 4360 3767 3711 62.28600
## F3D142 3183 2297 2184 2153 2012 1960 61.57713
## F3D143 3178 2361 2247 2247 1946 1931 60.76149
## F3D144 4827 3287 3173 3134 2435 2395 49.61674
## F3D145 7377 4952 4809 4821 4454 4358 59.07550
## F3D146 5021 3807 3711 3643 3178 3137 62.47759
## F3D147 17070 11439 11254 11227 9741 9276 54.34095
## F3D148 12405 9055 8917 8723 7324 7109 57.30754
## F3D149 13083 9724 9543 9544 8329 8163 62.39395
## F3D150 5509 4036 3877 3883 3178 3178 57.68742
## F3D2 19620 15017 14885 14681 14145 13823 70.45362
## F3D3 6758 5003 4903 4885 4175 3981 58.90796
## F3D5 4448 3289 3176 3161 2901 2901 65.22032
## F3D6 7989 6206 6064 6081 5277 5223 65.37739
## F3D7 5129 3917 3814 3805 3494 3372 65.74381
## F3D8 5294 4285 4188 4189 3781 3768 71.17491
## F3D9 7070 5651 5511 5532 5261 5208 73.66337
## Mock 4779 4083 4060 4062 3912 3912 81.85813
There should not be any major drops in the number of sequences in the samples for any of the steps, and the final counts should include most of the original reads.
taxa <- assignTaxonomy(seqtab_nochimera, "./silva_nr_v132_train_set.fa.gz", multithread = T)
taxa.print <- taxa
rownames(taxa.print) <- NULL #removing the rownames (which are full sequences) for display purposes
head(taxa.print)
## Kingdom Phylum Class Order Family
## [1,] "Bacteria" "Bacteroidetes" "Bacteroidia" "Bacteroidales" "Muribaculaceae"
## [2,] "Bacteria" "Bacteroidetes" "Bacteroidia" "Bacteroidales" "Muribaculaceae"
## [3,] "Bacteria" "Bacteroidetes" "Bacteroidia" "Bacteroidales" "Muribaculaceae"
## [4,] "Bacteria" "Bacteroidetes" "Bacteroidia" "Bacteroidales" "Bacteroidaceae"
## [5,] "Bacteria" "Bacteroidetes" "Bacteroidia" "Bacteroidales" "Rikenellaceae"
## [6,] "Bacteria" "Bacteroidetes" "Bacteroidia" "Bacteroidales" "Muribaculaceae"
## Genus
## [1,] NA
## [2,] NA
## [3,] NA
## [4,] "Bacteroides"
## [5,] "Alistipes"
## [6,] NA
We can check the accuracy of the taxonomy assignment by comparing the results to what is expected given a “mock” community constructed of known taxa that was sequenced alongside the other samples.
unqs.mock <- seqtab_nochimera["Mock",]
unqs.mock <- sort(unqs.mock[unqs.mock > 0], decreasing = T) #drop ASVs missing from the Mock community
cat("DADA2 inferred", length(unqs.mock), "sample sequences present in the mock community.\n")
## DADA2 inferred 20 sample sequences present in the mock community.
mock.ref <- getSequences(file.path(path, "HMP_MOCK.v35.fasta"))
match.ref <- sum(sapply(names(unqs.mock), function(x) any(grepl(x, mock.ref))))
cat("of those,", sum(match.ref), "were exact matches to expected reference sequences.\n")
## of those, 20 were exact matches to expected reference sequences.
mouse_data.df <- read.table("./MiSeq_SOP/mouse.time.design", header=T)
mouse_community.df <- merge(mouse_data.df, as.data.frame(seqtab_nochimera), by.x="group", by.y="row.names")
Rarefaction and species accumulation curves are used to assess whether or not our sampling was effective enough to capture most of the variation present in the communities we are interested in.
Rarefaction curves are used to make sure we recovered enough reads in our samples to be confident we are representing close to the actual species richness present. With species accumulation curves, we are assessing wheter we sampled enough sites to capture most of the diversity present in these communities. In both cases, we should see arcs that level off near an asymptote (i.e. the inferred “true” number of species/ASVs present).
rarecurve(seqtab_nochimera[-20,], sample=min(rowSums(seqtab_nochimera[-20,])), col="red")
mouse_specaccum <- specaccum(seqtab_nochimera[-20,])
mouse_specaccum.df <- data.frame(sites=mouse_specaccum$sites, richness=mouse_specaccum$richness, sd=mouse_specaccum$sd)
ggplot(data=mouse_specaccum.df)+
geom_line(aes(x=sites, y=richness))+
geom_ribbon(aes(x=sites, ymin=richness-sd, ymax=richness+sd), alpha=0.5)
mouse.pca <- dudi.pca(mouse_community.df[,-c(1:2)], scannf=F, nf=2)
ggplot(data=mouse.pca$li, aes(x=Axis1, y=Axis2, color=mouse_community.df$time))+
geom_point()+
stat_ellipse()+
labs(color="Time Group")+
theme_classic()
loadings_taxonomy <- merge(mouse.pca$co, taxa, by="row.names") #attach PC loadings to the taxonomy assignment to see what taxa are driving variation
head(sort_by(loadings_taxonomy[,-1], abs(loadings_taxonomy$Comp1), decreasing=T), n = 20)
## Comp1 Comp2 Kingdom Phylum Class
## 136 -0.9707309 0.075193540 Bacteria Firmicutes Clostridia
## 50 -0.9698994 -0.023367828 Bacteria Firmicutes Clostridia
## 138 -0.9643137 0.014661595 Bacteria Firmicutes Clostridia
## 132 -0.9411651 0.060223986 Bacteria Firmicutes Clostridia
## 198 -0.9361116 -0.128905332 Bacteria Firmicutes Clostridia
## 81 -0.9338237 0.083838976 Bacteria Firmicutes Clostridia
## 143 -0.9332400 0.160635075 Bacteria Firmicutes Clostridia
## 196 -0.9327119 0.108762840 Bacteria Firmicutes Clostridia
## 133 -0.9317057 0.008309817 Bacteria Firmicutes Clostridia
## 83 -0.9300247 0.027069724 Bacteria Firmicutes Clostridia
## 222 -0.9047067 0.110638957 Bacteria Firmicutes Clostridia
## 90 -0.9031631 -0.143833727 Bacteria Firmicutes Clostridia
## 162 -0.9009005 0.090411192 Bacteria Firmicutes Clostridia
## 78 -0.8992363 -0.157644005 Bacteria Firmicutes Clostridia
## 47 -0.8888190 -0.076713750 Bacteria Firmicutes Clostridia
## 5 -0.8855748 0.053158401 Bacteria Proteobacteria Gammaproteobacteria
## 192 -0.8843518 -0.022641515 Bacteria Firmicutes Clostridia
## 145 -0.8720650 -0.010608597 Bacteria Firmicutes Clostridia
## 98 -0.8683911 -0.012190885 Bacteria Firmicutes Clostridia
## 86 -0.8664072 0.236371871 Bacteria Firmicutes Clostridia
## Order Family Genus
## 136 Clostridiales Lachnospiraceae Acetatifactor
## 50 Clostridiales Ruminococcaceae Ruminiclostridium_5
## 138 Clostridiales Lachnospiraceae Lachnospiraceae_NK4A136_group
## 132 Clostridiales Lachnospiraceae Lachnospiraceae_NK4A136_group
## 198 Clostridiales Ruminococcaceae Oscillibacter
## 81 Clostridiales Lachnospiraceae Acetatifactor
## 143 Clostridiales Lachnospiraceae <NA>
## 196 Clostridiales Ruminococcaceae Oscillibacter
## 133 Clostridiales Lachnospiraceae Lachnospiraceae_NK4A136_group
## 83 Clostridiales Lachnospiraceae Acetatifactor
## 222 Clostridiales Lachnospiraceae <NA>
## 90 Clostridiales Lachnospiraceae Lachnospiraceae_UCG-006
## 162 Clostridiales Lachnospiraceae Lachnospiraceae_UCG-001
## 78 Clostridiales Lachnospiraceae Roseburia
## 47 Clostridiales Ruminococcaceae Hydrogenoanaerobacterium
## 5 Pseudomonadales Pseudomonadaceae Pseudomonas
## 192 Clostridiales Ruminococcaceae Intestinimonas
## 145 Clostridiales Lachnospiraceae <NA>
## 98 Clostridiales Lachnospiraceae Lachnospiraceae_NK4A136_group
## 86 Clostridiales Lachnospiraceae Lachnospiraceae_NK4A136_group
head(sort_by(loadings_taxonomy[,-1], abs(loadings_taxonomy$Comp2), decreasing=T), n = 20)
## Comp1 Comp2 Kingdom Phylum Class Order
## 104 0.176725227 -0.9040260 Bacteria Firmicutes Clostridia Clostridiales
## 116 -0.202811449 -0.8758021 Bacteria Firmicutes Clostridia Clostridiales
## 25 0.078206792 -0.8747599 Bacteria Bacteroidetes Bacteroidia Bacteroidales
## 166 0.018459346 -0.8733230 Bacteria Firmicutes Clostridia Clostridiales
## 135 -0.115286615 -0.8585803 Bacteria Firmicutes Clostridia Clostridiales
## 152 0.006966416 -0.8577845 Bacteria Firmicutes Clostridia Clostridiales
## 231 0.166537291 -0.8508686 Bacteria Firmicutes Clostridia Clostridiales
## 129 0.245881802 -0.8470250 Bacteria Firmicutes Clostridia Clostridiales
## 62 0.207283994 -0.8376596 Bacteria Firmicutes Clostridia Clostridiales
## 20 0.033176735 -0.8283804 Bacteria Bacteroidetes Bacteroidia Bacteroidales
## 108 0.151222392 -0.8206836 Bacteria Firmicutes Clostridia Clostridiales
## 123 0.279416133 -0.8090054 Bacteria Firmicutes Clostridia Clostridiales
## 84 0.190463409 -0.7957913 Bacteria Firmicutes Clostridia Clostridiales
## 205 0.334205622 -0.7845644 Bacteria Firmicutes Bacilli Lactobacillales
## 112 0.141555548 -0.7828789 Bacteria Firmicutes Clostridia Clostridiales
## 165 -0.551218636 -0.7779333 Bacteria Firmicutes Clostridia Clostridiales
## 19 0.262117213 -0.7670300 Bacteria Bacteroidetes Bacteroidia Bacteroidales
## 97 0.324920160 -0.7668876 Bacteria Firmicutes Clostridia Clostridiales
## 229 0.207286198 -0.7608486 Bacteria Firmicutes Clostridia Clostridiales
## 102 0.089002664 -0.7550471 Bacteria Firmicutes Clostridia Clostridiales
## Family Genus
## 104 Lachnospiraceae Lachnospiraceae_NK4A136_group
## 116 Lachnospiraceae <NA>
## 25 Muribaculaceae <NA>
## 166 Lachnospiraceae <NA>
## 135 Lachnospiraceae Acetatifactor
## 152 Lachnospiraceae <NA>
## 231 Lachnospiraceae Roseburia
## 129 Lachnospiraceae <NA>
## 62 Ruminococcaceae Anaerotruncus
## 20 Muribaculaceae <NA>
## 108 Lachnospiraceae <NA>
## 123 Lachnospiraceae Lachnoclostridium
## 84 Lachnospiraceae Lachnospiraceae_NK4A136_group
## 205 Lactobacillaceae Lactobacillus
## 112 Lachnospiraceae <NA>
## 165 Lachnospiraceae <NA>
## 19 Muribaculaceae <NA>
## 97 Lachnospiraceae Lachnospiraceae_NK4A136_group
## 229 Lachnospiraceae Roseburia
## 102 Lachnospiraceae Acetatifactor
PERMANOVAs are a method where we can test if treatment groups have significantly different community compositions. They are based on distance matrices of our community data (here we’ll use Bray-Curtis distances, a common metric used for both traditional species counts and eDNA community data), and because they use a permuted bootstrapping testing structure, do not rely on assumptions of empirical distributions like a typical ANOVA or MANOVA.
adonis2(mouse_community.df[,-c(1:2)] ~ time, data=mouse_community.df, method="bray")
## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = mouse_community.df[, -c(1:2)] ~ time, data = mouse_community.df, method = "bray")
## Df SumOfSqs R2 F Pr(>F)
## Model 1 0.64399 0.27135 6.3309 0.001 ***
## Residual 17 1.72928 0.72865
## Total 18 2.37327 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1