Any associations with dieldrin loss?
Which microorganisms can be associated with this?
Is this associated with Soil organic matter decomposition?
How does this relate to known xenobiotic degradation pathways/enzymes?
What are the limitations of picrust predictions?
[1] 11313798
`set.seed(TRUE)` was used to initialize repeatable random subsampling.
Please record this for your records so others can reproduce.
Try `set.seed(TRUE); .Random.seed` for the full vector
...
1OTUs were removed because they are no longer
present in any sample after random subsampling
...
ps object for KO metagenome
physeq
phyloseq-class experiment-level object
otu_table() OTU Table: [ 6728 taxa and 36 samples ]
sample_data() Sample Data: [ 36 samples by 77 sample variables ]
ps object for the pathway file
phyloseq-class experiment-level object
otu_table() OTU Table: [ 168 taxa and 36 samples ]
sample_data() Sample Data: [ 36 samples by 77 sample variables ]
PCA will reduce the noise and spits out only relevant variation into independant axes. The question then is what drives the variation of these axis, i.e. the variation in enzymes as predicted by picrust2 and which enzyme are driven more or less in these axis.
(ps_clr <- microbiome::transform(physeq, "clr"))
phyloseq-class experiment-level object
otu_table() OTU Table: [ 6728 taxa and 36 samples ]
sample_data() Sample Data: [ 36 samples by 77 sample variables ]
phyloseq::otu_table(ps_clr)[1:5, 1:6]
OTU Table: [5 taxa and 6 samples]
taxa are rows
C-1 C-10 C-11 C-12 C-13 C-14
K00001 3.08232112 3.4835033 3.1461450 3.4613316 3.5909267 3.5909645
K00002 -0.66688751 -1.7160013 -6.9421413 -6.7524959 -6.6424699 -6.6426013
K00003 3.73985162 4.2517402 4.0085975 4.1580456 4.2861852 4.2783313
K00004 0.03585803 -0.2907328 -0.3148322 0.2894002 -0.1249358 -0.2936201
K00005 -0.61196184 -3.7181646 -1.0426408 -2.7843547 -2.0381677 -1.5851315
ord_clr <- phyloseq::ordinate(ps_clr, "RDA")
#Examine eigenvalues and % prop. variance explained
head(ord_clr$CA$eig)
PC1 PC2 PC3 PC4 PC5 PC6
1182.4395 905.7623 579.1140 454.5662 412.6621 390.8011
sapply(ord_clr$CA$eig[1:8], function(x) x / sum(ord_clr$CA$eig))
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
0.16245723 0.12444411 0.07956539 0.06245356 0.05669630 0.05369278 0.04501448 0.03858885
adding the pc scores to metadata
Correlations with psych package, selecting the relevant variables
Filtering of significant correlations
ind.coord <- data.frame(ord_clr$CA$v)
sdev_ind <- apply(ind.coord, 1, sd)
ind_cont_PCA1 <- data.frame(PCA = (100*(1 / nrow(ind.coord)*(ind.coord$PC1^2 /sdev_ind))))
ind_cont_PCA1_top <- ind_cont_PCA1 %>%
rownames_to_column("otu") %>%
filter(PCA >= 0.001803355)%>%
column_to_rownames("otu")
ind_cont_PCA2 <- data.frame(PCA = (100*(1 / nrow(ind.coord)*(ind.coord$PC2^2 /sdev_ind))))
ind_cont_PCA2_top <- ind_cont_PCA2 %>%
rownames_to_column("otu") %>%
filter(PCA >= 0.002)%>%
column_to_rownames("otu")
ind_cont_PCA3 <- data.frame(PCA = (100*(1 / nrow(ind.coord)*(ind.coord$PC3^2 / sdev_ind))))
ind_cont_PCA3_top <- ind_cont_PCA3 %>%
rownames_to_column("otu") %>%
filter(PCA >= 0.002)%>%
column_to_rownames("otu")
ind_cont_PCA_top <- rbind(ind_cont_PCA1_top, ind_cont_PCA2_top, ind_cont_PCA3_top)
sum(ind_cont_PCA1_top$PCA) / sum(ind_cont_PCA1$PCA)
[1] 0.01260249
nrow(ind_cont_PCA1_top)
[1] 8
ind_cont_PCA1_top %>% rownames_to_column() %>% arrange(desc(PCA))
The top 8 KOs contribute 1.2% of the total variation of PC1
sum(ind_cont_PCA2_top$PCA) / sum(ind_cont_PCA2$PCA)
[1] 0.03741334
nrow(ind_cont_PCA2_top)
[1] 17
ind_cont_PCA2_top %>% rownames_to_column() %>% arrange(desc(PCA))
17 KOs contributed 3.7% of the variation of PC2
sum(ind_cont_PCA3_top$PCA) / sum(ind_cont_PCA3$PCA)
[1] 0.1691838
nrow(ind_cont_PCA3_top)
[1] 49
ind_cont_PCA3_top %>% rownames_to_column() %>% arrange(desc(PCA))
49 KOs contribute 17% of the variation
pre-filtering the clr-transformed KO predictions that correlate with Dloss
(Spearman, bonferroni corrected p-values)
ind_cont_PCA_top %>% rownames_to_column(var = "pathway") %>%
subset(pathway %in% as.vector(alx.cor.sig_KO_metagenome_dloss$rowname))
NA
latest openly available KEGG mapping file is from 2011
(ps_clr.pw <- microbiome::transform(physeq.pw, "clr"))
phyloseq-class experiment-level object
otu_table() OTU Table: [ 168 taxa and 36 samples ]
sample_data() Sample Data: [ 36 samples by 77 sample variables ]
phyloseq::otu_table(ps_clr.pw)[1:6, 1:6]
OTU Table: [6 taxa and 6 samples]
taxa are rows
C-1 C-10 C-11 C-12 C-13 C-14
Glycolysis / Gluconeogenesis 1.984978 1.893012 1.909584 2.007376 1.977939 1.898468
Citrate cycle (TCA cycle) 2.271239 2.216183 2.238249 2.327457 2.284068 2.204037
Pentose phosphate pathway 2.226468 2.125080 2.168618 2.247038 2.247256 2.171432
Pentose and glucuronate interconversions 1.485162 1.389241 1.403059 1.496271 1.499278 1.431174
Fructose and mannose metabolism 1.414391 1.318097 1.352803 1.423055 1.410451 1.326772
Galactose metabolism 1.573259 1.435351 1.469676 1.567618 1.576223 1.505246
ord_clr.pw <- phyloseq::ordinate(ps_clr.pw, "RDA")
#Examine eigenvalues and % prop. variance explained
head(ord_clr.pw$CA$eig)
PC1 PC2 PC3 PC4 PC5 PC6
36.15911 24.84179 20.38784 17.94773 15.36852 10.85460
sapply(ord_clr.pw$CA$eig[1:8], function(x) x / sum(ord_clr.pw$CA$eig))
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
0.20981354 0.14414472 0.11830063 0.10414188 0.08917596 0.06298389 0.04099468 0.03906864
Joining, by = "#SampleID"
r_table.Var1 | r_table.Var2 | r_table.value | p_table.value |
---|---|---|---|
CN | Dloss | -0.9207945 | 0.0000000 |
CN | ParticulateC | -0.8295257 | 0.0000003 |
CN | Delta.C.Frac | 0.8243774 | 0.0000004 |
CN | Gibbsite% | 0.7344102 | 0.0002205 |
CN | ResistantC | 0.7300341 | 0.0002795 |
TC | CN | 0.7284896 | 0.0003034 |
CN | Illite% | -0.7277174 | 0.0003158 |
D1517 | CN | 0.7260847 | 0.0003442 |
CN | CP | 0.6827981 | 0.0028120 |
CN | Organic Matter | 0.6824120 | 0.0028520 |
CN | Silt_1 | -0.6808675 | 0.0030456 |
TN | CN | 0.6269714 | 0.0257867 |
pc1 | CN | 0.6228200 | 0.0297277 |
CN | C_Sand | 0.6102066 | 0.0458238 |
ind.coord.pw <- data.frame(ord_clr.pw$CA$v)
sdev_ind.pw <- apply(ind.coord.pw, 1, sd)
ind_cont_PCA1.pw <- data.frame(PCA = (100*(1 / nrow(ind.coord.pw)*(ind.coord.pw$PC1^2 /sdev_ind.pw))))
ind_cont_PCA1_top.pw <- ind_cont_PCA1.pw %>%
rownames_to_column("otu") %>%
filter(PCA >= 0.008)%>%
column_to_rownames("otu")
ind_cont_PCA2.pw <- data.frame(PCA = (100*(1 / nrow(ind.coord.pw)*(ind.coord.pw$PC2^2 /sdev_ind.pw))))
ind_cont_PCA2_top.pw <- ind_cont_PCA2.pw %>%
rownames_to_column("otu") %>%
filter(PCA >= 0.02)%>%
column_to_rownames("otu")
ind_cont_PCA3.pw <- data.frame(PCA = (100*(1 / nrow(ind.coord.pw)*(ind.coord.pw$PC3^2 / sdev_ind.pw))))
ind_cont_PCA3_top.pw <- ind_cont_PCA3.pw %>%
rownames_to_column("otu") %>%
filter(PCA >= 0.03)%>%
column_to_rownames("otu")
ind_cont_PCA_top.pw <- rbind(ind_cont_PCA1_top.pw, ind_cont_PCA2_top.pw, ind_cont_PCA3_top.pw)
# plot showing pathways that are contributing the most
clr1.pw <- ord_clr.pw$CA$eig[1] / sum(ord_clr.pw$CA$eig)
clr2.pw <- ord_clr.pw$CA$eig[2] / sum(ord_clr.pw$CA$eig)
p1.2 <- phyloseq::plot_ordination(ps_clr.pw, ord_clr.pw, type="taxa") +
coord_fixed(clr2.pw / clr1.pw) +
ggtitle("PCA on picrust predictions", subtitle = "(centred-log ratio)")
No available covariate data to map on the points for this plot `type`
p1.2$data <- p1.2$data %>% rownames_to_column(var = "OTU") %>% filter(OTU %in% rownames(ind_cont_PCA_top.pw))
require(ggrepel)
p1.2 + geom_text_repel(label = p1.2$data$OTU) +
coord_fixed(xlim = c(-2.5,2.5))
Coordinate system already present. Adding new coordinate system, which will replace the existing one.
sum(ind_cont_PCA1_top.pw$PCA) / sum(ind_cont_PCA1.pw$PCA)
[1] 0.8614549
nrow(ind_cont_PCA1_top.pw)
[1] 30
ind_cont_PCA1_top.pw %>% rownames_to_column() %>% arrange(desc(PCA))
30 pathways contribute 86% of the variation to PC1
sum(ind_cont_PCA2_top.pw$PCA) / sum(ind_cont_PCA2.pw$PCA)
[1] 0.9238069
nrow(ind_cont_PCA2_top.pw)
[1] 14
ind_cont_PCA2_top.pw %>% rownames_to_column() %>% arrange(desc(PCA))
14 pathways contribute 92% of the variation to PC2
sum(ind_cont_PCA3_top.pw$PCA) / sum(ind_cont_PCA3.pw$PCA)
[1] 0.9430541
nrow(ind_cont_PCA3_top.pw)
[1] 12
ind_cont_PCA3_top.pw %>% rownames_to_column() %>% arrange(desc(PCA))
12 pathways contribute 94% of variation of PC3
pre-filtering the clr-transformed pathway predictions that correlate with Dloss
(Spearman, bonferroni corrected p-values)
X1 | rowname | r | p | BH |
---|---|---|---|---|
10 | Nitrotoluene degradation | -3.939458 | 0.0007508 | 0.0282023 |
13 | Caprolactam degradation | -3.882228 | 0.0010714 | 0.0308874 |
5 | Novobiocin biosynthesis | 3.571833 | 0.0019547 | 0.0346315 |
14 | Isoflavonoid biosynthesis | 3.585446 | 0.0025726 | 0.0351884 |
9 | Aminobenzoate degradation | -3.650042 | 0.0020745 | 0.0383916 |
11 | Styrene degradation | -3.585659 | 0.0022433 | 0.0397478 |
7 | Xylene degradation | -3.407035 | 0.0026174 | 0.0421643 |
4 | Bisphenol degradation | -3.351442 | 0.0034821 | 0.0483770 |
8 | Polycyclic aromatic hydrocarbon degradation | -3.271920 | 0.0049180 | 0.0554459 |
1 | Geraniol degradation | -3.283802 | 0.0049243 | 0.0576555 |
3 | Benzoate degradation | -3.121054 | 0.0076213 | 0.0714633 |
16 | Basal transcription factors | 2.912020 | 0.0087980 | 0.0718041 |
12 | Limonene and pinene degradation | -3.058269 | 0.0079924 | 0.0741012 |
6 | Biosynthesis of 12-, 14- and 16-membered macrolides | 3.228308 | 0.0121400 | 0.0759852 |
2 | Chlorocyclohexane and chlorobenzene degradation | -2.934814 | 0.0116668 | 0.0900326 |
15 | ABC transporters | -2.877058 | 0.0126350 | 0.0949811 |
17 | Spliceosome | 3.330436 | 0.0246215 | 0.0952889 |
ind_cont_PCA_top.pw %>% rownames_to_column(var = "pathway") %>%
subset(pathway %in% as.vector(alx.cor.sig$rowname))
Ancom