This code is related to data cleaning and analysis for the microbiome data. Since the final version of the manuscript is not yet out and for confidentiality reasons, the corresponding data sets will not be made available. However we will use the command head() to show you the general format of datasets. We will focus on how to draw a microbial network based on Spearman correlations.
Prerequisites
We first load the corresponding data we will be using for the analysis, then we load required packages.
taxa_table <- read.csv("D:/Chris/ETUDES1/PhD thesis/R files/DATABASE (excel)/Microbiome/HN00162625_OTU1/usable in R/CSVs/phyloseq/tax_table.csv", row.names = 1)
metadata <- read.csv("D:/Chris/ETUDES1/PhD thesis/R files/DATABASE (excel)/Microbiome/HN00162625_OTU1/usable in R/CSVs/phyloseq/metadata.csv", row.names = 1)
sp_ratio <- read.csv("D:/Chris/ETUDES1/PhD thesis/R files/DATABASE (excel)/Microbiome/HN00162625_OTU1/usable in R/CSVs/species_ratio.csv")
head(taxa_table)## phylum class order family
## Otu1 Actinobacteria Actinomycetia Corynebacteriales Mycobacteriaceae
## Otu2 Actinobacteria Actinomycetia Corynebacteriales Nocardiaceae
## Otu3 Actinobacteria Actinomycetia Micrococcales Microbacteriaceae
## Otu4 Actinobacteria Actinomycetia Micrococcales Microbacteriaceae
## Otu5 Actinobacteria Actinomycetia Micrococcales Micrococcaceae
## Otu6 Actinobacteria Actinomycetia Micrococcales Micrococcaceae
## genus species
## Otu1 Mycolicibacterium Mycolicibacterium frederiksbergense
## Otu2 Nocardia Nocardia shimofusensis
## Otu3 Curtobacterium Curtobacterium oceanosedimentum
## Otu4 Rhodoluna Rhodoluna lacicola
## Otu5 Pseudarthrobacter Pseudarthrobacter phenanthrenivorans
## Otu6 Rothia Rothia mucilaginosa
head(metadata)## treatment pe_supplementation body_weight ileum_weight ceacum_length label
## C1 0%.SSPP No 1582 0.8337547 15.00 0.01
## C2 0%.SSPP No 1548 0.9198966 17.75 0.01
## C3 0%.SSPP No 1778 0.8003375 16.75 0.01
## C4 0%.SSPP No 1394 0.8680057 16.00 0.01
## C5 0%.SSPP No 1746 0.6844215 13.00 0.01
## C6 0%.SSPP No 1310 0.9862595 15.50 0.01
## numOtus
## C1 250
## C2 250
## C3 250
## C4 250
## C5 250
## C6 250
head(sp_ratio)## Species C1 C2 C3 C4 C5 C6 C7 C8 L1 L2 L3
## 1 __Mycolicibacterium frederiksbergense 0 0 0 0 0 0 0 0 0.00e+00 0 0
## 2 __Nocardia shimofusensis 0 0 0 0 0 0 0 0 0.00e+00 0 0
## 3 __Curtobacterium oceanosedimentum 0 0 0 0 0 0 0 0 0.00e+00 0 0
## 4 __Rhodoluna lacicola 0 0 0 0 0 0 0 0 0.00e+00 0 0
## 5 __Pseudarthrobacter phenanthrenivorans 0 0 0 0 0 0 0 0 0.00e+00 0 0
## 6 __Rothia mucilaginosa 0 0 0 0 0 0 0 0 8.08e-05 0 0
## L4 L5 L6 L7 L8 H1 H2 H3 H4 H5 H6 H7 H8
## 1 0 0 0.000178126 0 0 0 0 0 0 0 0 0.000193767 0
## 2 0 0 0.000000000 0 0 0 0 0 0 0 0 0.000096900 0
## 3 0 0 0.000000000 0 0 0 0 0 0 0 0 0.000096900 0
## 4 0 0 0.000000000 0 0 0 0 0 0 0 0 0.000096900 0
## 5 0 0 0.000000000 0 0 0 0 0 0 0 0 0.001614726 0
## 6 0 0 0.000000000 0 0 0 0 0 0 0 0 0.000000000 0
The taxa_table dataset will contain taxonomic infos about our Otu, metadata refers to informations about our samples, and sp_ratio will contain relative abundance of each Otu per samples. Next we load our libraries.
As you might see, we will use the igraph library to draw our networks, and the tidyverse will be used mainly for data wrangling and tidying.
Tidying dataset & first network plot
We will transform the data in a suitable form to facilitate network construction.
sp_ratio <- sp_ratio %>%
as_tibble() %>%
rename_all(tolower) %>%
mutate(species = str_replace_all(species, "__", "")) %>%
column_to_rownames(var = "species")Next we use prevalence filtering to remove Otus that are not expressed in at least 13 samples, then we apply Spearman’s correlation on the data set to get a correlation matrix. Afterward, we define a threshold of correlation value to be represented in the network, otherwise we might end up with a network containing hundreds of nodes and edges (which will make it difficult to read and interpret). Next we create the network file and we remove nodes without edges. Finally we have a first look at our plot which is not really informative at this stage.
min.prevalence=13
incidence=sp_ratio
incidence[incidence>0]=1
sp_ratio_filtered <- sp_ratio[which(rowSums(incidence)>=min.prevalence),] ### end of prevalence filtering
sp_correl <- sp_ratio_filtered %>%
t() %>%
cor(method = "spearman") ### correlation calculations
sp_correl[abs(sp_correl)<0.65]=0 ### define threshold for correlations
net_work <- graph_from_adjacency_matrix(sp_correl,mode="lower",weighted=TRUE, diag=FALSE) ## create network file
net_work <- delete.vertices(net_work,degree(net_work)==0) #remove nodes without edges
plot(net_work, vertex.label = NA, edge.width = 5, vertex.size = 10) ## first plot Adding attributes to the network
Now we will enhance the network appearance by adding attributes to the nodes and edges. It will also make the plot much more readable and will help us during interpretations. Each major steps of the code will be explained.
taxa_tbl <- taxa_table %>%
as_tibble() %>%
mutate(species = str_replace_all(species, "__", "")) #tidy taxonomic infos
net_work_used <- V(net_work)$name %>%
as_tibble() %>%
mutate(species = value) %>%
select(species) #extract species represented in the network
v_attr <- sp_ratio %>% ### we create a table of attributes for nodes (vertex)
rownames_to_column( var = "species") %>%
as_tibble() %>%
pivot_longer(-species, names_to = "sample_id", values_to = "ratio" ) %>%
mutate(species = str_replace_all(species, "__", "")) %>%
group_by(species) %>%
summarise(rel_abundance = sum(ratio)) %>%
inner_join(net_work_used, by = "species") %>%
inner_join(taxa_tbl, by = "species") %>%
mutate(rel_abundance = abs(exp (rel_abundance))) # we join taxonomic infos, relative abundance and species that were only represented in the network to create the attribute table
network_table <- igraph::as_data_frame(net_work, 'both') ##we convert the network in data frames
network_table$vertices <- network_table$vertices %>%
as_tibble() %>%
inner_join(v_attr, by = c("name"="species")) # we add our attribute in the data frames
net_work1 <- graph_from_data_frame(network_table$edges,
directed = F,
vertices = network_table$vertices) # we convert back data frames to a networkPlotting a network with edges coloured by class
Now we will use the attribute class to color each edges of the network.
mom_data <- V(net_work1)$class %>%
as_tibble_col(column_name = "species") #formating the class variable as factor (is needed for coloring edges)
mom_data$species <- as_factor(mom_data$species)
color_easy <- c("pink", "brown", "orange", "dodgerblue", "lightgray")[mom_data$species] #creating a color palette to represent each class levels
V(net_work1)$color <- color_easy ## we have now the color attributes based on class
E(net_work1)$sign <- E(net_work1)$weight ##we create another attribute from the weight
E(net_work1)$weight <- abs(E(net_work1)$weight) ## we then use absolute value of weight because of the specific layout we will use
### details about the network plot (with class attribute used for coloring vertex)
plot(net_work1, vertex.size= 8, #size of nodes
edge.width=abs(E(net_work1)$weight)*4, #width of edges (edge is a segment between nodes)
vertex.label = NA, #remove microbes names
edge.color=ifelse(E(net_work1)$sign > 0, "blue","red"), # color edges based on weight sign
layout=layout_with_kk(net_work1)) #specific layout
title_legend1 <- V(net_work1)$class %>%
as_factor() %>%
levels() #extracting class levels in an object for the legend
legend(x = 0.78, y = 1, title_legend1,
pch = 21, pt.bg = c("pink", "brown", "orange", "dodgerblue", "lightgray"),
pt.cex = 2.5, bty = "n", ncol = 1) #adding legendPlotting a network (basic)
We will plot the basic network that will have the name of the vertex included.
png("netnames.png", units="in", width=13, height=13, res=1000)
plot(net_work1, vertex.size= 8, edge.width=abs(E(net_work1)$weight)*4,
vertex.label.cex = 0.8, vertex.color = "gold", vertex.label.color = "black",
vertex.label.font = 3, vertex.label.dist = 1.5,
edge.color=ifelse(E(net_work1)$sign > 0, "blue","red"),
layout=layout_with_kk(net_work1))
dev.off()## png
## 2
Here we repeat the same code chunk to show you the final result. Since the previous one will save the plot in your working directory.
plot(net_work1, vertex.size= 8, edge.width=abs(E(net_work1)$weight)*4,
vertex.label.cex = 0.8, vertex.color = "gold", vertex.label.color = "black",
vertex.label.font = 3, vertex.label.dist = 1.5,
edge.color=ifelse(E(net_work1)$sign > 0, "blue","red"),
layout=layout_with_kk(net_work1))Plotting a network with Phylum as attribute
We will repeat a similar process to what we used for using class as an attribute to color nodes.
net_work2 <- graph_from_data_frame(network_table$edges,
directed = F,
vertices = network_table$vertices) # we convert back data frames to a network
mom_data_2 <- V(net_work2)$phylum %>%
as_tibble_col(column_name = "species") #formating the class variable as factor (is needed for coloring edges)
mom_data_2$species <- as_factor(mom_data_2$species)
color_easy_2 <- c("dodgerblue", "lightgray")[mom_data_2$species] #creating a color palette to represent each class levels
V(net_work2)$color <- color_easy_2 ## we have now the color attributes based on class
E(net_work2)$sign <- E(net_work2)$weight ##we create another attribute from the weight
E(net_work2)$weight <- abs(E(net_work2)$weight) ## we then use absolute value of weight because of the specific layout we will use
### details about the network plot (with class attribute used for coloring vertex)
plot(net_work2, vertex.size= 8, #size of nodes
edge.width=abs(E(net_work2)$weight)*4, #width of edges (edge is a segment between nodes)
vertex.label = NA, #remove microbes names
edge.color=ifelse(E(net_work2)$sign > 0, "blue","red"), # color edges based on weight sign
layout=layout_with_kk(net_work2)) #specific layout
title_legend2 <- V(net_work2)$phylum %>%
as_factor() %>%
levels() #extracting class levels in an object for the legend
legend(x = 0.75, y = 1, title_legend2,
pch = 21, pt.bg = c("dodgerblue", "lightgray"),
pt.cex = 2.5, bty = "n", ncol = 1) #adding legendPlotting two networks together in one layout
Since the plots obtained from igraph are not ggplots like it is impossible to use the “patchwork” package for combining plots. Thus we will use the function in base R for doing so.
png("netcombined.png", units="in", width=16, height=8, res=1000)
par(mfrow=c(1,2), mar=c(0,0,0,0))
plot(net_work1, vertex.size= 8, #size of nodes
edge.width=abs(E(net_work1)$weight)*4, #width of edges (edge is a segment between nodes)
vertex.label = NA, #remove microbes names
edge.color=ifelse(E(net_work2)$sign > 0, "blue","red"), # color edges based on weight sign
layout=layout_with_kk(net_work1)) #specific layout
legend(x = 0.75, y = 1, title_legend1,
pch = 21, pt.bg = c("pink", "brown", "orange", "dodgerblue", "lightgray"),
pt.cex = 2.5, bty = "n", ncol = 1) #adding legend
plot(net_work2, vertex.size= 8, #size of nodes
edge.width=abs(E(net_work2)$weight)*4, #width of edges (edge is a segment between nodes)
vertex.label = NA, #remove microbes names
edge.color=ifelse(E(net_work2)$sign > 0, "blue","red"), # color edges based on weight sign
layout=layout_with_kk(net_work2)) #specific layout
legend(x = 0.72, y = 1, title_legend2,
pch = 21, pt.bg = c("dodgerblue", "lightgray"),
pt.cex = 2.5, bty = "n", ncol = 1) #adding legend
dev.off()## png
## 2
Here we repeat the same code chunk to show you the final result. Since the previous one will save the plot in your working directory.
par(mfrow=c(1,2), mar=c(0,0,0,0))
plot(net_work1, vertex.size= 8, #size of nodes
edge.width=abs(E(net_work1)$weight)*4, #width of edges (edge is a segment between nodes)
vertex.label = NA, #remove microbes names
edge.color=ifelse(E(net_work2)$sign > 0, "blue","red"), # color edges based on weight sign
layout=layout_with_kk(net_work1)) #specific layout
legend(x = 0.75, y = 1, title_legend1,
pch = 21, pt.bg = c("pink", "brown", "orange", "dodgerblue", "lightgray"), cex = 1,
pt.cex = 1.5, bty = "n", ncol = 1) #adding legend
plot(net_work2, vertex.size= 8, #size of nodes
edge.width=abs(E(net_work2)$weight)*4, #width of edges (edge is a segment between nodes)
vertex.label = NA, #remove microbes names
edge.color=ifelse(E(net_work2)$sign > 0, "blue","red"), # color edges based on weight sign
layout=layout_with_kk(net_work2)) #specific layout
legend(x = 0.72, y = 1, title_legend2,
pch = 21, pt.bg = c("dodgerblue", "lightgray"), cex = 1,
pt.cex = 1.5, bty = "n", ncol = 1) #adding legend