Overall observations

Any associations with dieldrin loss?

Which microorganisms can be associated with this?

Is this associated with Soil organic matter decomposition?

How does this relate to known xenobiotic degradation pathways/enzymes?

What are the limitations of picrust predictions?

Metadata

[1] 11313798
`set.seed(TRUE)` was used to initialize repeatable random subsampling.
Please record this for your records so others can reproduce.
Try `set.seed(TRUE); .Random.seed` for the full vector
...
1OTUs were removed because they are no longer 
present in any sample after random subsampling

...

Phyloseq objects

ps object for KO metagenome

physeq
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 6728 taxa and 36 samples ]
sample_data() Sample Data:       [ 36 samples by 77 sample variables ]

ps object for the pathway file

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 168 taxa and 36 samples ]
sample_data() Sample Data:       [ 36 samples by 77 sample variables ]

KO_METAGENOME PCA plots and correlation tables.

PCA Aitchison biplot on KO predictions and correlations of axis-scores with environmental variables and significant features

PCA will reduce the noise and spits out only relevant variation into independant axes. The question then is what drives the variation of these axis, i.e. the variation in enzymes as predicted by picrust2 and which enzyme are driven more or less in these axis.

clr tranform of metagenome predictions

(ps_clr <- microbiome::transform(physeq, "clr"))  
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 6728 taxa and 36 samples ]
sample_data() Sample Data:       [ 36 samples by 77 sample variables ]
phyloseq::otu_table(ps_clr)[1:5, 1:6]
OTU Table:          [5 taxa and 6 samples]
                     taxa are rows
               C-1       C-10       C-11       C-12       C-13       C-14
K00001  3.08232112  3.4835033  3.1461450  3.4613316  3.5909267  3.5909645
K00002 -0.66688751 -1.7160013 -6.9421413 -6.7524959 -6.6424699 -6.6426013
K00003  3.73985162  4.2517402  4.0085975  4.1580456  4.2861852  4.2783313
K00004  0.03585803 -0.2907328 -0.3148322  0.2894002 -0.1249358 -0.2936201
K00005 -0.61196184 -3.7181646 -1.0426408 -2.7843547 -2.0381677 -1.5851315

PCA via phyloseq

ord_clr <- phyloseq::ordinate(ps_clr, "RDA")
#Examine eigenvalues and % prop. variance explained
  head(ord_clr$CA$eig)
      PC1       PC2       PC3       PC4       PC5       PC6 
1182.4395  905.7623  579.1140  454.5662  412.6621  390.8011 
sapply(ord_clr$CA$eig[1:8], function(x) x / sum(ord_clr$CA$eig))
       PC1        PC2        PC3        PC4        PC5        PC6        PC7        PC8 
0.16245723 0.12444411 0.07956539 0.06245356 0.05669630 0.05369278 0.04501448 0.03858885 

Correlations to PC1, PC2 and PC3 (with BH corrected p-values)

adding the pc scores to metadata

Correlations with psych package, selecting the relevant variables

Filtering of significant correlations

KOs contributing to the variation in PC1, PC2 and PC3

ind.coord <- data.frame(ord_clr$CA$v)
sdev_ind <- apply(ind.coord, 1, sd)
ind_cont_PCA1 <- data.frame(PCA = (100*(1 / nrow(ind.coord)*(ind.coord$PC1^2 /sdev_ind))))
ind_cont_PCA1_top <- ind_cont_PCA1 %>% 
  rownames_to_column("otu") %>% 
  filter(PCA >= 0.001803355)%>% 
  column_to_rownames("otu")
ind_cont_PCA2 <- data.frame(PCA = (100*(1 / nrow(ind.coord)*(ind.coord$PC2^2 /sdev_ind))))
ind_cont_PCA2_top <- ind_cont_PCA2 %>% 
  rownames_to_column("otu") %>% 
  filter(PCA >= 0.002)%>% 
  column_to_rownames("otu")
ind_cont_PCA3 <- data.frame(PCA = (100*(1 / nrow(ind.coord)*(ind.coord$PC3^2 / sdev_ind))))
ind_cont_PCA3_top <- ind_cont_PCA3 %>% 
  rownames_to_column("otu") %>% 
  filter(PCA >= 0.002)%>% 
  column_to_rownames("otu")
ind_cont_PCA_top <- rbind(ind_cont_PCA1_top, ind_cont_PCA2_top, ind_cont_PCA3_top)

Visualisation of PCA for KO metagenome

sum(ind_cont_PCA1_top$PCA) / sum(ind_cont_PCA1$PCA)
[1] 0.01260249
nrow(ind_cont_PCA1_top)
[1] 8
ind_cont_PCA1_top %>% rownames_to_column() %>% arrange(desc(PCA))

The top 8 KOs contribute 1.2% of the total variation of PC1

sum(ind_cont_PCA2_top$PCA) / sum(ind_cont_PCA2$PCA)
[1] 0.03741334
nrow(ind_cont_PCA2_top)
[1] 17
ind_cont_PCA2_top  %>% rownames_to_column() %>% arrange(desc(PCA))

17 KOs contributed 3.7% of the variation of PC2

sum(ind_cont_PCA3_top$PCA) / sum(ind_cont_PCA3$PCA)
[1] 0.1691838
nrow(ind_cont_PCA3_top)
[1] 49
ind_cont_PCA3_top  %>% rownames_to_column() %>% arrange(desc(PCA))

49 KOs contribute 17% of the variation

One more: Lets test which KO-predictions (clr transformed) signicantly correlate to dieldrin loss - and graph those

pre-filtering the clr-transformed KO predictions that correlate with Dloss
(Spearman, bonferroni corrected p-values)

There were 55 significantly correlated KOs when running aldex.cor with Dloss as variable (BH p <= 0.02)?

Why Aldex? Anova-Like Differential Expression for high throughput data (Aldex), taking sample variation into account.

How many of those were also top contributers to PC1, PC2 or PC3?

ind_cont_PCA_top  %>% rownames_to_column(var = "pathway") %>% 
  subset(pathway %in% as.vector(alx.cor.sig_KO_metagenome_dloss$rowname))
NA

How many of those match the spearman correlations?

spearman.corr scatterplots

Aldex.corr scatterplots

KEGG pathway PCA plot (from KO metagenome mapping)

latest openly available KEGG mapping file is from 2011

clr transform

(ps_clr.pw <- microbiome::transform(physeq.pw, "clr"))  
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 168 taxa and 36 samples ]
sample_data() Sample Data:       [ 36 samples by 77 sample variables ]
phyloseq::otu_table(ps_clr.pw)[1:6, 1:6]
OTU Table:          [6 taxa and 6 samples]
                     taxa are rows
                                              C-1     C-10     C-11     C-12     C-13     C-14
Glycolysis / Gluconeogenesis             1.984978 1.893012 1.909584 2.007376 1.977939 1.898468
Citrate cycle (TCA cycle)                2.271239 2.216183 2.238249 2.327457 2.284068 2.204037
Pentose phosphate pathway                2.226468 2.125080 2.168618 2.247038 2.247256 2.171432
Pentose and glucuronate interconversions 1.485162 1.389241 1.403059 1.496271 1.499278 1.431174
Fructose and mannose metabolism          1.414391 1.318097 1.352803 1.423055 1.410451 1.326772
Galactose metabolism                     1.573259 1.435351 1.469676 1.567618 1.576223 1.505246

PCA via phyloseq


ord_clr.pw <- phyloseq::ordinate(ps_clr.pw, "RDA")
#Examine eigenvalues and % prop. variance explained
  head(ord_clr.pw$CA$eig)
     PC1      PC2      PC3      PC4      PC5      PC6 
36.15911 24.84179 20.38784 17.94773 15.36852 10.85460 
sapply(ord_clr.pw$CA$eig[1:8], function(x) x / sum(ord_clr.pw$CA$eig))
       PC1        PC2        PC3        PC4        PC5        PC6        PC7        PC8 
0.20981354 0.14414472 0.11830063 0.10414188 0.08917596 0.06298389 0.04099468 0.03906864 

Correlations of soil variables to PC1, PC2 and PC3 (with BH corrected p-values)

Joining, by = "#SampleID"

CN correlations with other soil variables (comparing 36 soil variables, BH p < 0.05)

r_table.Var1 r_table.Var2 r_table.value p_table.value
CN Dloss -0.9207945 0.0000000
CN ParticulateC -0.8295257 0.0000003
CN Delta.C.Frac 0.8243774 0.0000004
CN Gibbsite% 0.7344102 0.0002205
CN ResistantC 0.7300341 0.0002795
TC CN 0.7284896 0.0003034
CN Illite% -0.7277174 0.0003158
D1517 CN 0.7260847 0.0003442
CN CP 0.6827981 0.0028120
CN Organic Matter 0.6824120 0.0028520
CN Silt_1 -0.6808675 0.0030456
TN CN 0.6269714 0.0257867
pc1 CN 0.6228200 0.0297277
CN C_Sand 0.6102066 0.0458238

PCA plot for KEGG pathways

Pathways contributing to the variation in PC1, PC2 and PC3



ind.coord.pw <- data.frame(ord_clr.pw$CA$v)
sdev_ind.pw <- apply(ind.coord.pw, 1, sd)
ind_cont_PCA1.pw <- data.frame(PCA = (100*(1 / nrow(ind.coord.pw)*(ind.coord.pw$PC1^2 /sdev_ind.pw))))
ind_cont_PCA1_top.pw <- ind_cont_PCA1.pw %>% 
  rownames_to_column("otu") %>% 
  filter(PCA >= 0.008)%>% 
  column_to_rownames("otu")
ind_cont_PCA2.pw <- data.frame(PCA = (100*(1 / nrow(ind.coord.pw)*(ind.coord.pw$PC2^2 /sdev_ind.pw))))
ind_cont_PCA2_top.pw <- ind_cont_PCA2.pw %>% 
  rownames_to_column("otu") %>% 
  filter(PCA >= 0.02)%>% 
  column_to_rownames("otu")
ind_cont_PCA3.pw <- data.frame(PCA = (100*(1 / nrow(ind.coord.pw)*(ind.coord.pw$PC3^2 / sdev_ind.pw))))
ind_cont_PCA3_top.pw <- ind_cont_PCA3.pw %>% 
  rownames_to_column("otu") %>% 
  filter(PCA >= 0.03)%>% 
  column_to_rownames("otu")
ind_cont_PCA_top.pw <- rbind(ind_cont_PCA1_top.pw, ind_cont_PCA2_top.pw, ind_cont_PCA3_top.pw)
# plot showing pathways that are contributing the most
clr1.pw <- ord_clr.pw$CA$eig[1] / sum(ord_clr.pw$CA$eig)
clr2.pw <- ord_clr.pw$CA$eig[2] / sum(ord_clr.pw$CA$eig)
p1.2 <- phyloseq::plot_ordination(ps_clr.pw, ord_clr.pw, type="taxa") +
    coord_fixed(clr2.pw / clr1.pw) +
  ggtitle("PCA on picrust predictions", subtitle = "(centred-log ratio)")
No available covariate data to map on the points for this plot `type`
p1.2$data <-  p1.2$data  %>% rownames_to_column(var = "OTU") %>%  filter(OTU %in% rownames(ind_cont_PCA_top.pw))
require(ggrepel)
p1.2 + geom_text_repel(label = p1.2$data$OTU) +
    coord_fixed(xlim = c(-2.5,2.5)) 
Coordinate system already present. Adding new coordinate system, which will replace the existing one.

sum(ind_cont_PCA1_top.pw$PCA) / sum(ind_cont_PCA1.pw$PCA)
[1] 0.8614549
nrow(ind_cont_PCA1_top.pw)
[1] 30
ind_cont_PCA1_top.pw %>% rownames_to_column() %>% arrange(desc(PCA))

30 pathways contribute 86% of the variation to PC1

sum(ind_cont_PCA2_top.pw$PCA) / sum(ind_cont_PCA2.pw$PCA)
[1] 0.9238069
nrow(ind_cont_PCA2_top.pw)
[1] 14
ind_cont_PCA2_top.pw  %>% rownames_to_column() %>% arrange(desc(PCA))

14 pathways contribute 92% of the variation to PC2

sum(ind_cont_PCA3_top.pw$PCA) / sum(ind_cont_PCA3.pw$PCA)
[1] 0.9430541
nrow(ind_cont_PCA3_top.pw)
[1] 12
ind_cont_PCA3_top.pw %>% rownames_to_column() %>% arrange(desc(PCA))

12 pathways contribute 94% of variation of PC3

Pathway (clr transformed) correlations with dieldrin loss

pre-filtering the clr-transformed pathway predictions that correlate with Dloss
(Spearman, bonferroni corrected p-values)

The novobiocin biosynthesis and the pathway for x-membered macrolides were also identified as signifiant using the aldex package with function aldex.corr

In total there were 17 significant pathways (BH < 0.1) based on aldex.corr

(Dieldrin concentrations were not significantly associated with any pathways (aldex.corr), only dieldrin loss)

X1 rowname r p BH
10 Nitrotoluene degradation -3.939458 0.0007508 0.0282023
13 Caprolactam degradation -3.882228 0.0010714 0.0308874
5 Novobiocin biosynthesis 3.571833 0.0019547 0.0346315
14 Isoflavonoid biosynthesis 3.585446 0.0025726 0.0351884
9 Aminobenzoate degradation -3.650042 0.0020745 0.0383916
11 Styrene degradation -3.585659 0.0022433 0.0397478
7 Xylene degradation -3.407035 0.0026174 0.0421643
4 Bisphenol degradation -3.351442 0.0034821 0.0483770
8 Polycyclic aromatic hydrocarbon degradation -3.271920 0.0049180 0.0554459
1 Geraniol degradation -3.283802 0.0049243 0.0576555
3 Benzoate degradation -3.121054 0.0076213 0.0714633
16 Basal transcription factors 2.912020 0.0087980 0.0718041
12 Limonene and pinene degradation -3.058269 0.0079924 0.0741012
6 Biosynthesis of 12-, 14- and 16-membered macrolides 3.228308 0.0121400 0.0759852
2 Chlorocyclohexane and chlorobenzene degradation -2.934814 0.0116668 0.0900326
15 ABC transporters -2.877058 0.0126350 0.0949811
17 Spliceosome 3.330436 0.0246215 0.0952889

boxplots of degradation pathways

boxplots of remaining pathways

How many of those pathways are also associated with PC1, PC2 or PC3?

ind_cont_PCA_top.pw  %>% rownames_to_column(var = "pathway") %>% 
  subset(pathway %in% as.vector(alx.cor.sig$rowname))

Ancom significant different pathways by soil

Ancom

Aldex (aldex.clr(aldex.df, cond, mc.samples=10000, denom=“iqlr”, verbose=F) to find significant different pathways for each soil (29 pathways p <= 0.01 BH corrected)

---
title: "Picrust2 downstream stuff"
subtitle: "KO metagenome and KEGG pathways. Pool of results and some overall observations (this is for me only so you might be bored reading this)."
output: 
  html_notebook:
    toc: yes
---


# Overall observations

**Any associations with  dieldrin loss?**

- Isoflavonoid  and Novobiocin biosynthesis (KEGG pathways)
- PWY-5531, PWY-7159 (chlorophyllide biosynthesis, aerobic or anaerobic Metacyc)
- All other pathways are associated with lower dieldrin loss including xenobiotic degradation pathways (Nitrotoluene, caprolactam, aminobenzoate, styrene, xylene, bisphenol and PAH degradation
  - Why is there a higher xenobiotic degradation potential in soils that had lower losses?
  - ABC transporters (competitive traint @Wood2018) are also high in the soil with lowest dieldrin loss, together with 
- NO DDT metabolism (either as KOs or pathways) was present
- NO Naphtalene metabolism was present - although three related enzymes were present (K00480 Salicylate hydroxylase , K18242 salicylate 5-hydroxylase large subunit and K13953alcohol dehydrogenase, propanol-preferring). These enzymes can be part of other biodegradation pathways too. However, they were not correlated with levels of dieldrin loss (no enzyme was found to be associated to dieldrin concentrations either).


**Which microorganisms can be associated with this?** 
  
**Is this associated with Soil organic matter decomposition?**
  
- PCA1 (all 168 KEGG pathways) was correlated to CN ratio
- PCA1 (all 410 Metacyc pathways) was correlated to Humus% (clr)
  
**How does this relate to known xenobiotic degradation pathways/enzymes?**

- example xenobiotic drug metabolism in gut
  
**What are the limitations of picrust predictions?**  






```{r message=FALSE, warning=FALSE, echo=FALSE}
#mywd
setwd("~/Documents/Study/LaTrobe/Research/phD/Milestone2_Field_Survey/M2_Bacteria/Picrust")
#mypackages
library(ggrepel)
library(qiime2R)
library(tidyverse)
library(stringr)
library(ggplot2)
theme_set(theme_bw())  
library(vegan)
library(ggpubr)
library(RColorBrewer)
library(phyloseq)
library(psych)
library(reshape2)
library(ggfortify)
library(factoextra)
library(FactoMineR)
library(maigesPack)
library(knitr)
#files from picrust2 output
#Metacyc pathways predictions with NSTI set to 0.65
```


```{r message=FALSE, warning=FALSE, echo=FALSE}
KEGGpwdesc <- read_tsv("~/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles/KEGG_pathways_info.tsv")

KEGG_pwabu <- read_tsv("~/Documents/Study/LaTrobe/Research/phD/Milestone2_Field_Survey/M2_Bacteria/Picrust/2020_0113_picrust_ASVminfreq10minspl2_2551ASV_0.65minNSTI/picrust2_out_pipeline/KEGG_pathways_out/path_abun_unstrat.tsv") %>% rename("ko_number" = "pathway")

pathway.pred <- inner_join(KEGG_pwabu,KEGGpwdesc, by = 'ko_number') %>% 
  select(pathway, everything()) %>% select(c(-ko_number)) %>% as_tibble() %>% column_to_rownames("pathway")


KO_metagenome <- read_tsv("~/Documents/Study/LaTrobe/Research/phD/Milestone2_Field_Survey/M2_Bacteria/Picrust/2020_0113_picrust_ASVminfreq10minspl2_2551ASV_0.65minNSTI/picrust2_out_pipeline/KO_metagenome_out/pred_metagenome_unstrat.tsv") %>% 
          column_to_rownames("function")
```

# Metadata
```{r results = 'hide', echo=FALSE}
metadata<-read_tsv("~/Documents/Study/LaTrobe/Research/phD/Milestone2_Field_Survey/M2_Bacteria/M2_Qiime_Bacteria/metadata.tsv") #requires tidyverse
metadata2 <- metadata[c(2:37),]
# Converting columns to numeric using "tidyverse"
metadata2 <- metadata[c(2:37),] %>%
  rownames_to_column("spl")%>%
  mutate_all(type.convert) %>%
  mutate_if(is.factor, as.character)%>%
  column_to_rownames("spl")
sample_data(metadata[c(2:37),] %>% as.data.frame() %>% column_to_rownames("#SampleID"))
# Add clr tranformed carbon composition
#Compute percentage frequencies
percentData <- data.frame(metadata2[c(37:39,41)])
sum_c <- apply(percentData, 1, sum) #Equates to TOC
percentData <- 100*percentData/sum_c 
percentData <- round(percentData, 2) # Round the sorted output to 1 digit
require(compositions)
c_comp <- data.frame(clr(percentData))
c_comp$`#SampleID` <- metadata2$`#SampleID`
metadata2 <- metadata2 %>% left_join(c_comp, by = "#SampleID")
metadata2 <- metadata2 %>% rename(Humus = HUM.y) %>% rename(ParticulateC = POC.y ) %>% rename(ResistantC = ROC.y )
#defining factors
metadata2$Soil <- factor(metadata2$Soil, levels=c("Kurosol", "Chromosol"))
metadata2$Paddock <- factor(metadata2$Paddock, 
                            levels=c("1", "2", "3", "4","5", "6","7", "8","9", "10","11", "12"))
metadata2$Dloss_fct <-  factor(metadata2$Dloss_fct, 
                               levels=c("over20", "over40", "over60", "over70","over80"))
```


```{r warning=FALSE,echo=FALSE}
### create ps object ####
physeq <-phyloseq(
        otu_table(KO_metagenome, taxa_are_rows = T), 
        sample_data(metadata2 %>% as.data.frame() %>% column_to_rownames("#SampleID"))
)

#adding KO diversity measures (with ps object) to metadata and re-create ps object 
min(colSums(otu_table(physeq)))
rare_Bac <- rarefy_even_depth(physeq, sample.size = min(colSums(otu_table(physeq))),  rngseed = TRUE, replace = TRUE, trimOTUs = TRUE)
ps.div <- estimate_richness(rare_Bac)
ps.div$`#SampleID` <- sample_names(physeq)
metadata2 <- metadata2 %>% 
  left_join(ps.div, by = "#SampleID")
metadata2$pielou_ev <- ps.div$Shannon/log(ps.div$Observed)      #Pielou eveness
#adding carbon ratios
metadata2 <- metadata2 %>% 
  mutate(shannon_C = Shannon / TC) %>%
  mutate(simpson_C = InvSimpson / TC) %>% 
  mutate(evenness_C = pielou_ev / TC) %>% 
  mutate(bcop_C = b_copies_ngDNA / TC) %>% 
  mutate(fcop_C  = f_copies_ngDNA / TC) 
physeq<-phyloseq(
  otu_table(KO_metagenome, taxa_are_rows = T), 
  sample_data(metadata2 %>% as.data.frame() %>% column_to_rownames("#SampleID"))
)
```
# Phyloseq objects
ps object for KO metagenome
```{r}
physeq
```

ps object  for the pathway file
```{r, echo=FALSE}
# now for the pathway file
physeq.pw <- phyloseq(otu_table(
             pathway.pred, taxa_are_rows = T), 
             sample_data(metadata2 %>% as.data.frame() %>% column_to_rownames("#SampleID"))
)
physeq.pw
```


```{r results = 'hide',echo=FALSE}
#adding pathway diversity measures to metadata 
#and this time no need to add carbon ratios for the pathways
min(colSums(otu_table(physeq.pw)))
rare_Bac.pw <- rarefy_even_depth(physeq.pw, sample.size = min(colSums(otu_table(physeq.pw))),  rngseed = TRUE, replace = TRUE, trimOTUs = TRUE)
ps.div.pw <- estimate_richness(rare_Bac.pw)
ps.div.pw$`#SampleID` <- sample_names(physeq.pw)
metadata3 <- metadata2 %>% 
  left_join(ps.div.pw, by = "#SampleID")
metadata3$pielou_ev <- ps.div.pw$Shannon/log(ps.div.pw$Observed)      #Pielou eveness
physeq.pw <-phyloseq(
  otu_table(pathway.pred, taxa_are_rows = T), 
  sample_data(metadata2 %>% as.data.frame() %>% column_to_rownames("#SampleID"))
)
physeq.pw
```
# KO_METAGENOME PCA plots and correlation tables. 
## PCA Aitchison biplot on KO predictions and correlations of axis-scores with environmental variables and significant features
PCA will reduce the noise and spits out only relevant variation into independant axes.
The question then is what drives the variation of these axis, i.e. the variation in enzymes as predicted by picrust2 and which enzyme are driven more or less in these axis.  
  
### clr tranform of metagenome predictions
```{r}
(ps_clr <- microbiome::transform(physeq, "clr"))  
phyloseq::otu_table(ps_clr)[1:5, 1:6]
```



### PCA via phyloseq
```{r}
ord_clr <- phyloseq::ordinate(ps_clr, "RDA")
#Examine eigenvalues and % prop. variance explained
  head(ord_clr$CA$eig)
sapply(ord_clr$CA$eig[1:8], function(x) x / sum(ord_clr$CA$eig))
```

### Correlations to PC1, PC2 and PC3 (with BH corrected *p*-values)
adding the pc scores to metadata  
```{r,echo=FALSE, message=FALSE, warning=FALSE}

clr_pcs <- data.frame(
  "pc1" = ord_clr$CA$u[,1],
  "pc2" = ord_clr$CA$u[,2],
  "pc3" = ord_clr$CA$u[,3]) %>% 
  rownames_to_column("#SampleID") %>%
  left_join(metadata3)
```


Correlations with psych package, selecting the relevant variables  
```{r,echo=FALSE}
require(psych)
require(reshape2)
correlation_matrix=corr.test(clr_pcs[,c("pc1","pc2", "pc3","D1517","ppDDE", "ppDDT", "DDT1517", "TC","TN","CN" 
                                        ,"HC", "P",   "CP" ,"pH" ,"b_copies_ngDNA" ,"f_copies_ngDNA","Dloss" ,"DDTloss","C_Sand","F_Sand" ,
                                         "T_Sand","Silt_1", "Clay_1","Organic Matter", "Field Capacity", "Wilting Point", "Kaolinite%" ,"Illite%", 
                                        "Smectite%", "Gibbsite%" ,
                                        "Clay_total","ParticulateC","Humus" ,"ResistantC" , "Delta.C.Frac","Observed.x", "pielou_ev","Shannon.x")],
                             method="spearman", adjust="holm")
r_matrix=correlation_matrix$r
r_matrix[row(r_matrix)>col(r_matrix)]=0
diag(r_matrix)=0
p_matrix=correlation_matrix$p
p_matrix[row(p_matrix)>col(p_matrix)]=1
diag(p_matrix)=1
r_table=melt(r_matrix)
p_table=melt(p_matrix)
cortable <- data.frame(r_table$Var1, r_table$Var2, r_table$value, p_table$value) #creats the table with r and p values
```

Filtering of significant correlations
```{r,echo=FALSE, message=FALSE, warning=FALSE}
#filter only pc1 related correlations
pc1_cor <- cortable %>% 
  filter(r_table.Var1 == "pc1" | 
           r_table.Var2 == "pc1") %>% 
  filter(p_table.value <= 0.06) %>%
  arrange(p_table.value)
?arrange
#filter only pc2 related correlations
pc2_cor <- cortable %>% 
  filter(r_table.Var1 == "pc2" | 
           r_table.Var2 == "pc2") %>% 
  filter(p_table.value <= 0.06) %>%
  arrange(p_table.value)

#filter only pc3 related correlations
pc3_cor <- cortable %>% 
  filter(r_table.Var1 == "pc3" | 
           r_table.Var2 == "pc3")%>% 
  filter(p_table.value <= 0.06) %>%
  arrange(p_table.value)
pc1_cor
pc2_cor
pc3_cor
```


### KOs contributing to the variation in PC1, PC2 and PC3
```{r fig.height=8, fig.width=8, echo=TRUE}
ind.coord <- data.frame(ord_clr$CA$v)
sdev_ind <- apply(ind.coord, 1, sd)
ind_cont_PCA1 <- data.frame(PCA = (100*(1 / nrow(ind.coord)*(ind.coord$PC1^2 /sdev_ind))))
ind_cont_PCA1_top <- ind_cont_PCA1 %>% 
  rownames_to_column("otu") %>% 
  filter(PCA >= 0.001803355)%>% 
  column_to_rownames("otu")
ind_cont_PCA2 <- data.frame(PCA = (100*(1 / nrow(ind.coord)*(ind.coord$PC2^2 /sdev_ind))))
ind_cont_PCA2_top <- ind_cont_PCA2 %>% 
  rownames_to_column("otu") %>% 
  filter(PCA >= 0.002)%>% 
  column_to_rownames("otu")
ind_cont_PCA3 <- data.frame(PCA = (100*(1 / nrow(ind.coord)*(ind.coord$PC3^2 / sdev_ind))))
ind_cont_PCA3_top <- ind_cont_PCA3 %>% 
  rownames_to_column("otu") %>% 
  filter(PCA >= 0.002)%>% 
  column_to_rownames("otu")
ind_cont_PCA_top <- rbind(ind_cont_PCA1_top, ind_cont_PCA2_top, ind_cont_PCA3_top)
```

### Visualisation of PCA for KO metagenome
```{r, echo=FALSE}
# plot1 showing Resistant Carbon as colour gradient
# Scale axes and plot ordination
clr1 <- ord_clr$CA$eig[1] / sum(ord_clr$CA$eig)
clr2 <- ord_clr$CA$eig[2] / sum(ord_clr$CA$eig)
p1 <- phyloseq::plot_ordination(ps_clr, ord_clr, type="samples", color = "CN", shape = "Soil") +
    geom_point(aes(size = Dloss)) +
   scale_size(breaks = c(30,50,80)) +
    coord_fixed(clr2 / clr1) +
    scale_colour_gradient(low = "grey", high = "black")+
  ggtitle("PCA on picrust prediction", subtitle = "(centred-log ratio KO metagenome)")
p1
```

```{r, echo=FALSE, warning=FALSE, message=FALSE}
# plot showing pathways that are contributing the most
clr1 <- ord_clr$CA$eig[1] / sum(ord_clr$CA$eig)
clr2 <- ord_clr$CA$eig[2] / sum(ord_clr$CA$eig)
p1.12 <- phyloseq::plot_ordination(ps_clr, ord_clr, type="taxa") +
    coord_fixed(clr2 / clr1) +
  ggtitle("PCA, top contributing KOs")

p1.12$data <-  p1.12$data  %>% rownames_to_column(var = "OTU") %>%  filter(OTU %in% rownames(ind_cont_PCA_top))
require(ggrepel)
p1.12  <- p1.12 + geom_text_repel(label = p1.12$data$OTU) + coord_fixed(xlim = c(-5,5), ylim = c(-5,5)) 

p1.12
```



```{r fig.height=8, fig.width=8, echo=TRUE}
sum(ind_cont_PCA1_top$PCA) / sum(ind_cont_PCA1$PCA)
nrow(ind_cont_PCA1_top)
ind_cont_PCA1_top %>% rownames_to_column() %>% arrange(desc(PCA))
```
The top 8 KOs contribute 1.2% of the total variation of PC1


```{r fig.height=8, fig.width=8, echo=TRUE}
sum(ind_cont_PCA2_top$PCA) / sum(ind_cont_PCA2$PCA)
nrow(ind_cont_PCA2_top)
ind_cont_PCA2_top  %>% rownames_to_column() %>% arrange(desc(PCA))
```
17 KOs contributed 3.7% of the variation of PC2

```{r fig.height=8, fig.width=8, echo=TRUE}
sum(ind_cont_PCA3_top$PCA) / sum(ind_cont_PCA3$PCA)
nrow(ind_cont_PCA3_top)
ind_cont_PCA3_top  %>% rownames_to_column() %>% arrange(desc(PCA))
```
49 KOs contribute 17% of the variation


## One more: Lets test which KO-predictions (clr transformed) signicantly correlate to dieldrin loss - and graph those 
pre-filtering the clr-transformed KO predictions that correlate with Dloss  
(Spearman, bonferroni corrected *p*-values)

```{r, echo = FALSE, warning=FALSE}
Dloss <- metadata2$Dloss
otutable <- data.frame(t(phyloseq::otu_table(ps_clr)))
otutable <- otutable[match(metadata2$`#SampleID`, row.names(otutable)),] #make sure its in order with variable
test <- apply(otutable ,2, cor.test, Dloss, method = "spearman")
test.p <- data.frame(pvals = sapply(test, "[[", "p.value"))
test.p$pval.adj <- p.adjust(test.p$pvals, method = "bonferroni", n = length(test.p$pvals))
test.p.sig <-  test.p %>% 
  rownames_to_column(var = "KO")%>% 
    filter(pval.adj <= 0.05)%>%
    arrange(pval.adj)
test.p.sig
```





### There were 55 significantly correlated KOs when running aldex.cor with Dloss as variable (BH *p* <= 0.02)?
Why Aldex? Anova-Like Differential Expression for high throughput data [(Aldex)](https://rdrr.io/bioc/ALDEx2/), taking sample variation into account. 

```{r echo=FALSE, message=FALSE, warning=FALSE}
alx.cor.sig_KO_metagenome_dloss <- read_csv("~/Documents/Study/LaTrobe/Research/phD/Milestone2_Field_Survey/M2_Bacteria/Picrust/alx.cor.sig_KO_metagenome_dloss_2020021.csv") %>% arrange(BH)

alx.cor.sig_KO_metagenome_dloss 
```

### How many of those were also top contributers to PC1, PC2 or PC3?
```{r}
ind_cont_PCA_top  %>% rownames_to_column(var = "pathway") %>% 
  subset(pathway %in% as.vector(alx.cor.sig_KO_metagenome_dloss$rowname))

```

### How many of those match the spearman correlations? 
```{r echo=FALSE, message=FALSE, warning=FALSE}
alx.cor.sig_KO_metagenome_dloss <- read_csv("~/Documents/Study/LaTrobe/Research/phD/Milestone2_Field_Survey/M2_Bacteria/Picrust/alx.cor.sig_KO_metagenome_dloss_2020021.csv") %>% arrange(BH)

alx.cor.sig_KO_metagenome_dloss %>% 
  subset(rowname %in% as.vector(test.p.sig$KO))
```

### spearman.corr scatterplots
```{r fig.height=8, fig.width=9, echo=FALSE}
cls <- c("grey8","blue")
ps_clr.mlt <- psmelt(ps_clr) #melt dataframe for plotting
ps_clr.mlt  <- ps_clr.mlt  %>% subset(OTU %in% as.vector(test.p.sig$KO)) #filter the ECs that significant correlate with PC1,2 and 3
ps_clr.mlt %>% 
  ggplot(data = ., aes(x = Dloss, y = Abundance))+
  geom_point(aes(color = Soil, shape = Soil)) +
  labs(x = "", y = "Abundance (clr)\n") +
  facet_wrap(~ OTU, scales = "free") +
  ggtitle("KO predictions vs Dloss (spearman.corr p < 0.05, bonferroni corrected)")+ 
  theme(strip.text = element_text(size = 8))+
  scale_colour_manual(values = cls) +
  stat_cor(method = "spearman",label.y.npc = "bottom")
```

### Aldex.corr scatterplots
```{r fig.height=8, fig.width=10, echo=FALSE}
cls <- c("grey8","blue")
ps_clr.mlt <- psmelt(ps_clr) #melt dataframe for plotting
ps_clr.mlt  <- ps_clr.mlt  %>% subset(OTU %in% as.vector(alx.cor.sig_KO_metagenome_dloss$rowname)) #filter the ECs that significant correlate with PC1,2 and 3
ps_clr.mlt %>% 
  ggplot(data = ., aes(x = Dloss, y = Abundance))+
  geom_point(aes(color = Soil, shape = Soil)) +
  labs(x = "", y = "Abundance (clr)\n") +
  facet_wrap(~ OTU, scales = "free") +
  ggtitle("KO predictions vs Dloss (aldex.corr p < 0.02, BH corrected)")+ 
  theme(strip.text = element_text(size = 8))+
  scale_colour_manual(values = cls) +
  stat_cor(method = "spearman",label.y.npc = "bottom")
```
  
  
# KEGG pathway PCA plot (from KO metagenome mapping)
latest openly available KEGG mapping file is from 2011
 
## clr transform
```{r}
(ps_clr.pw <- microbiome::transform(physeq.pw, "clr"))  
phyloseq::otu_table(ps_clr.pw)[1:6, 1:6]
```

## PCA via phyloseq
```{r}

ord_clr.pw <- phyloseq::ordinate(ps_clr.pw, "RDA")
#Examine eigenvalues and % prop. variance explained
  head(ord_clr.pw$CA$eig)
sapply(ord_clr.pw$CA$eig[1:8], function(x) x / sum(ord_clr.pw$CA$eig))
```
 
 
##  Correlations of soil variables to PC1, PC2 and PC3 (with BH corrected *p*-values) 
```{r, echo=FALSE}
clr_pcs.pw <- data.frame(
  "pc1" = ord_clr.pw$CA$u[,1],
  "pc2" = ord_clr.pw$CA$u[,2],
  "pc3" = ord_clr.pw$CA$u[,3]) %>% 
  rownames_to_column("#SampleID") %>%
  left_join(metadata3)
```
 
```{r, echo=FALSE}
require(psych)
require(reshape2)
correlation_matrix=corr.test(clr_pcs.pw [,c("pc1","pc2", "pc3","D1517","ppDDE", "ppDDT", "DDT1517", "TC","TN","CN" 
                                        ,"HC", "P",   "CP" ,"pH" ,"b_copies_ngDNA" ,"f_copies_ngDNA","Dloss" ,"DDTloss","C_Sand","F_Sand" ,
                                         "T_Sand","Silt_1", "Clay_1","Organic Matter", "Field Capacity", "Wilting Point", "Kaolinite%" ,"Illite%", 
                                        "Smectite%", "Gibbsite%" ,
                                        "Clay_total","ParticulateC","Humus" ,"ResistantC" , "Delta.C.Frac","Observed.x", "pielou_ev","Shannon.x")],
                             method="spearman", adjust="holm")
r_matrix=correlation_matrix$r
r_matrix[row(r_matrix)>col(r_matrix)]=0
diag(r_matrix)=0
p_matrix=correlation_matrix$p
p_matrix[row(p_matrix)>col(p_matrix)]=1
diag(p_matrix)=1
r_table=melt(r_matrix)
p_table=melt(p_matrix)
cortable.pw <- data.frame(r_table$Var1, r_table$Var2, r_table$value, p_table$value) #creats the table with r and p values
```


```{r, echo= FALSE}
#filter only pc1 related correlations
pc1_cor.pw <- cortable.pw %>% 
  filter(r_table.Var1 == "pc1" | 
           r_table.Var2 == "pc1") %>% 
  filter(p_table.value <= 0.1) %>%
  arrange(p_table.value)
?arrange
#filter only pc2 related correlations
pc2_cor.pw <- cortable.pw %>% 
  filter(r_table.Var1 == "pc2" | 
           r_table.Var2 == "pc2") %>% 
  filter(p_table.value <= 0.1) %>%
  arrange(p_table.value)

#filter only pc3 related correlations
pc3_cor.pw <- cortable.pw %>% 
  filter(r_table.Var1 == "pc3" | 
           r_table.Var2 == "pc3")%>% 
  filter(p_table.value <= 0.1) %>%
  arrange(p_table.value)
pc1_cor.pw
pc2_cor.pw
pc3_cor.pw
```

### CN correlations with other soil variables (comparing 36 soil variables, BH *p* < 0.05)
```{r, echo= FALSE}
#filter only pc1 related correlations
CN_cor.pw <- cortable.pw %>% 
  filter(r_table.Var1 == "CN" | 
           r_table.Var2 == "CN") %>% 
  filter(p_table.value <= 0.05) %>%
  arrange(p_table.value)
kable(CN_cor.pw)
```


###  PCA plot for KEGG pathways 
```{r fig.height=5, fig.width=6, echo=FALSE}
##### plot1 showing Humus as colour gradient
# Scale axes and plot ordination
clr1.pw <- ord_clr.pw$CA$eig[1] / sum(ord_clr.pw$CA$eig)
clr2.pw <- ord_clr.pw$CA$eig[2] / sum(ord_clr.pw$CA$eig)
p1.1 <- phyloseq::plot_ordination(ps_clr.pw, ord_clr.pw, type="samples", color = "CN", shape= "Soil") +
    geom_point(aes(size = Dloss)) +
    coord_fixed(clr2.pw / clr1.pw) +
    scale_size(breaks = c(30,50,80))+
    scale_colour_gradient(low = "grey", high = "black")+
  ggtitle("PCA on picrust predictions", subtitle = "(centred-log ratio)")
p1.1
```

### Pathways contributing to the variation in PC1, PC2 and PC3
```{r fig.height=8, fig.width=8, echo=TRUE}


ind.coord.pw <- data.frame(ord_clr.pw$CA$v)
sdev_ind.pw <- apply(ind.coord.pw, 1, sd)
ind_cont_PCA1.pw <- data.frame(PCA = (100*(1 / nrow(ind.coord.pw)*(ind.coord.pw$PC1^2 /sdev_ind.pw))))
ind_cont_PCA1_top.pw <- ind_cont_PCA1.pw %>% 
  rownames_to_column("otu") %>% 
  filter(PCA >= 0.008)%>% 
  column_to_rownames("otu")
ind_cont_PCA2.pw <- data.frame(PCA = (100*(1 / nrow(ind.coord.pw)*(ind.coord.pw$PC2^2 /sdev_ind.pw))))
ind_cont_PCA2_top.pw <- ind_cont_PCA2.pw %>% 
  rownames_to_column("otu") %>% 
  filter(PCA >= 0.02)%>% 
  column_to_rownames("otu")
ind_cont_PCA3.pw <- data.frame(PCA = (100*(1 / nrow(ind.coord.pw)*(ind.coord.pw$PC3^2 / sdev_ind.pw))))
ind_cont_PCA3_top.pw <- ind_cont_PCA3.pw %>% 
  rownames_to_column("otu") %>% 
  filter(PCA >= 0.03)%>% 
  column_to_rownames("otu")
ind_cont_PCA_top.pw <- rbind(ind_cont_PCA1_top.pw, ind_cont_PCA2_top.pw, ind_cont_PCA3_top.pw)
```

```{r fig.height=8, fig.width=8, echo=TRUE}
# plot showing pathways that are contributing the most
clr1.pw <- ord_clr.pw$CA$eig[1] / sum(ord_clr.pw$CA$eig)
clr2.pw <- ord_clr.pw$CA$eig[2] / sum(ord_clr.pw$CA$eig)
p1.2 <- phyloseq::plot_ordination(ps_clr.pw, ord_clr.pw, type="taxa") +
    coord_fixed(clr2.pw / clr1.pw) +
  ggtitle("PCA on picrust predictions", subtitle = "(centred-log ratio)")

p1.2$data <-  p1.2$data  %>% rownames_to_column(var = "OTU") %>%  filter(OTU %in% rownames(ind_cont_PCA_top.pw))
require(ggrepel)
p1.2 + geom_text_repel(label = p1.2$data$OTU) +
    coord_fixed(xlim = c(-2.5,2.5)) 
```

```{r fig.height=8, fig.width=8, echo=TRUE}
sum(ind_cont_PCA1_top.pw$PCA) / sum(ind_cont_PCA1.pw$PCA)
nrow(ind_cont_PCA1_top.pw)
ind_cont_PCA1_top.pw %>% rownames_to_column() %>% arrange(desc(PCA))
```

30 pathways contribute 86% of the variation to PC1



```{r fig.height=8, fig.width=8, echo=TRUE}
sum(ind_cont_PCA2_top.pw$PCA) / sum(ind_cont_PCA2.pw$PCA)
nrow(ind_cont_PCA2_top.pw)
ind_cont_PCA2_top.pw  %>% rownames_to_column() %>% arrange(desc(PCA))
```
14 pathways contribute 92% of the variation to PC2

```{r fig.height=8, fig.width=8, echo=TRUE}
sum(ind_cont_PCA3_top.pw$PCA) / sum(ind_cont_PCA3.pw$PCA)
nrow(ind_cont_PCA3_top.pw)
ind_cont_PCA3_top.pw %>% rownames_to_column() %>% arrange(desc(PCA))
```
12 pathways contribute 94% of variation of PC3



## Pathway (clr transformed) correlations with dieldrin loss
pre-filtering the clr-transformed pathway predictions that correlate with Dloss  
(Spearman, bonferroni corrected *p*-values)

```{r, echo = FALSE, warning=FALSE}
Dloss <- metadata2$Dloss
otutable.pw <- phyloseq::otu_table(ps_clr.pw) %>% t %>% as.data.frame() 
otutable.pw <- otutable.pw[match(metadata2$`#SampleID`, row.names(otutable.pw)),] #make sure its in order with variable
test.pw <- apply(otutable.pw ,2, cor.test, Dloss, method = "spearman")
test.pw.p <- data.frame(pvals = sapply(test.pw, "[[", "p.value"))
test.pw.p$pval.adj <- p.adjust(test.pw.p$pvals, method = "bonferroni", n = length(test.pw.p$pvals))
test.pw.p.sig <-  test.pw.p %>% 
  rownames_to_column(var = "KO")%>% 
    filter(pval.adj <= 0.05)%>%
    arrange(pval.adj)
test.pw.p.sig
```

### The novobiocin biosynthesis and the pathway for x-membered macrolides were also identified as signifiant using the aldex package with function aldex.corr
### In total there were 17 significant pathways (BH < 0.1) based on aldex.corr
#### (Dieldrin concentrations were not significantly associated with any pathways (aldex.corr), only dieldrin loss) 

```{r echo=FALSE, message=FALSE, warning=FALSE}
alx.cor.sig <- read_csv("~/Documents/Study/LaTrobe/Research/phD/Milestone2_Field_Survey/M2_Bacteria/Picrust/alx.cor.sig_KEGGpathways_dloss_2020019.csv") %>% arrange(BH)
kable(alx.cor.sig)
```


```{r fig.height=7, fig.width=10, echo=FALSE}
cls <- c("grey8","blue")
ps_clr.mlt.pw <- psmelt(ps_clr.pw) #melt dataframe for plotting
ps_clr.mlt.pw  <- ps_clr.mlt.pw  %>% subset(OTU %in% as.vector(alx.cor.sig$rowname)) #filter the ECs that significant correlate with PC1,2 and 3
ps_clr.mlt.pw %>% 
  ggplot(data = ., aes(x = Dloss, y = Abundance))+
  geom_point(aes(color = Soil, shape = Soil)) +
  labs(x = "", y = "Abundance (clr)\n") +
  facet_wrap(~ OTU, scales = "free") +
  ggtitle("Pathway predictions vs Dloss (aldex.corr, BH < 0.1)")+ 
  theme(strip.text = element_text(size = 8))+
  scale_colour_manual(values = cls) 
  #stat_cor(method = "spearman",label.y.npc = "bottom")
```
## boxplots of degradation pathways
```{r echo=, fig.height=8, fig.width=10}
ps_clr.mlt.pw %>% 
  filter(str_detect(OTU,"degradation") |
           str_detect(OTU,"ABC")) %>% 
  ggboxplot(data = ., x = "Dloss_fct", y = "Abundance", 
            facet.by = "OTU", color = "Soil", shape = "Soil") + 
  labs(x = "", y = "Abundance (clr)\n") +
  ggtitle("Pathway predictions vs Dloss (aldex.corr, BH < 0.1)")+ 
  scale_colour_manual(values = cls)
```
## boxplots of remaining pathways
```{r echo=, fig.height=5, fig.width=6}
ps_clr.mlt.pw %>% 
  filter(str_detect(tolower(OTU),"biosynthesis|factor")) %>% 
  ggboxplot(data = ., x = "Dloss_fct", y = "Abundance", 
            facet.by = "OTU", color = "Soil", shape = "Soil") + 
  labs(x = "", y = "Abundance (clr)\n") +
  ggtitle("Pathway predictions vs Dloss (aldex.corr, BH < 0.1)")+ 
  scale_colour_manual(values = cls)
```



### How many of those pathways are also associated with PC1, PC2 or PC3?
```{r}
ind_cont_PCA_top.pw  %>% rownames_to_column(var = "pathway") %>% 
  subset(pathway %in% as.vector(alx.cor.sig$rowname))
```

## Ancom significant different pathways by soil

![Ancom](/Users/christiankrohn/Applications/RStudio/R_notebooks/visualization.png)

```{r, echo= FALSE, warning=FALSE,  message=FALSE}
#pathway.pred  %>% t %>% as.data.frame() %>% rownames_to_column(var = "#SampleID") %>%
#  left_join(metadata3, by = '#SampleID')%>% 
#  select(matches('sample|Soil|ecm|macrolides'))%>% group_by(Soil) %>% summarise_all(list(mean))

bp1 <- pathway.pred  %>% t %>% as.data.frame() %>% rownames_to_column(var = "#SampleID") %>%
  left_join(metadata3, by = '#SampleID') %>% 
    melt(data = .) %>% 
    filter(str_detect(tolower(variable),"ecm|macrolides|isoflavonoid|invasion")) %>% 
    ggbarplot(data = ., x = "variable", y = "value", add = "mean_se", orientation = "horiz", fill = "Soil",position = position_dodge(0.7), width = 0.5, title = "Absolute pathway prediction", xlab = "") + 
  scale_fill_manual(values = cls)

bp2 <-   otutable.pw %>% rownames_to_column(var = "#SampleID")  %>%
  left_join(metadata3, by = '#SampleID') %>% 
    melt(data = .) %>% 
    filter(str_detect(tolower(variable),"ecm|macrolides|isoflavonoid|invasion")) %>% 
    ggboxplot(data = ., x = "variable", y = "value", orientation = "horiz", fill = "Soil", title = "clr ratios", xlab = "") + 
  scale_fill_manual(values = cls)

bp1
bp2
```

## Aldex (aldex.clr(aldex.df, cond, mc.samples=10000, denom="iqlr", verbose=F) to find significant different pathways for each soil (29 pathways *p* <= 0.01 BH corrected)

```{r echo=FALSE, fig.height=8, message=FALSE, warning=FALSE}
alx.ttest_pw_soil <- read_csv("~/Documents/Study/LaTrobe/Research/phD/Milestone2_Field_Survey/M2_Bacteria/Picrust/aldex.x.all_keggpw_soil.csv") %>% arrange(we.eBH) %>% filter(we.eBH <= 0.01)

filt.vc <- alx.ttest_pw_soil$X1
bp3 <- pathway.pred  %>% t %>% as.data.frame() %>% rownames_to_column(var = "#SampleID") %>%
  left_join(metadata3, by = '#SampleID') %>% 
    melt(data = .) %>% 
    filter(variable %in% filt.vc) %>% 
    ggbarplot(data = ., x = "variable", y = "value", add = "mean_se", orientation = "horiz", fill = "Soil",position = position_dodge(0.7), width = 0.5, title = "Absolute pathway prediction", xlab = "") + 
  scale_fill_manual(values = cls)


bp4 <- otutable.pw %>% rownames_to_column(var = "#SampleID")  %>%
  left_join(metadata3, by = '#SampleID') %>% 
    melt(data = .) %>% 
   filter(variable  %in%  as.vector(alx.ttest_pw_soil$X1)) %>% 
    ggboxplot(data = ., x = "variable", y = "value", orientation = "horiz", fill = "Soil", title = "clr ratios", xlab = "") + 
  scale_fill_manual(values = cls)

bp3
bp4
```









