This code is related to data cleaning and analysis for the microbiome data. Since the final version of the manuscript is not yet out and for confidentiality reasons, the corresponding data sets will not be made available. However we will use the command head() to show you the general format of datasets. We will focus on how to draw a microbial network based on Spearman correlations.

Prerequisites

We first load the corresponding data we will be using for the analysis, then we load required packages.

taxa_table <-  read.csv("D:/Chris/ETUDES1/PhD thesis/R files/DATABASE (excel)/Microbiome/HN00162625_OTU1/usable in R/CSVs/phyloseq/tax_table.csv", row.names = 1)
metadata <-  read.csv("D:/Chris/ETUDES1/PhD thesis/R files/DATABASE (excel)/Microbiome/HN00162625_OTU1/usable in R/CSVs/phyloseq/metadata.csv", row.names = 1)
sp_ratio <- read.csv("D:/Chris/ETUDES1/PhD thesis/R files/DATABASE (excel)/Microbiome/HN00162625_OTU1/usable in R/CSVs/species_ratio.csv")

head(taxa_table)
##              phylum         class             order            family
## Otu1 Actinobacteria Actinomycetia Corynebacteriales  Mycobacteriaceae
## Otu2 Actinobacteria Actinomycetia Corynebacteriales      Nocardiaceae
## Otu3 Actinobacteria Actinomycetia     Micrococcales Microbacteriaceae
## Otu4 Actinobacteria Actinomycetia     Micrococcales Microbacteriaceae
## Otu5 Actinobacteria Actinomycetia     Micrococcales    Micrococcaceae
## Otu6 Actinobacteria Actinomycetia     Micrococcales    Micrococcaceae
##                  genus                              species
## Otu1 Mycolicibacterium  Mycolicibacterium frederiksbergense
## Otu2          Nocardia               Nocardia shimofusensis
## Otu3    Curtobacterium      Curtobacterium oceanosedimentum
## Otu4         Rhodoluna                   Rhodoluna lacicola
## Otu5 Pseudarthrobacter Pseudarthrobacter phenanthrenivorans
## Otu6            Rothia                  Rothia mucilaginosa
head(metadata)
##    treatment pe_supplementation body_weight ileum_weight ceacum_length label
## C1   0%.SSPP                 No        1582    0.8337547         15.00  0.01
## C2   0%.SSPP                 No        1548    0.9198966         17.75  0.01
## C3   0%.SSPP                 No        1778    0.8003375         16.75  0.01
## C4   0%.SSPP                 No        1394    0.8680057         16.00  0.01
## C5   0%.SSPP                 No        1746    0.6844215         13.00  0.01
## C6   0%.SSPP                 No        1310    0.9862595         15.50  0.01
##    numOtus
## C1     250
## C2     250
## C3     250
## C4     250
## C5     250
## C6     250
head(sp_ratio)
##                                  Species C1 C2 C3 C4 C5 C6 C7 C8       L1 L2 L3
## 1  __Mycolicibacterium frederiksbergense  0  0  0  0  0  0  0  0 0.00e+00  0  0
## 2               __Nocardia shimofusensis  0  0  0  0  0  0  0  0 0.00e+00  0  0
## 3      __Curtobacterium oceanosedimentum  0  0  0  0  0  0  0  0 0.00e+00  0  0
## 4                   __Rhodoluna lacicola  0  0  0  0  0  0  0  0 0.00e+00  0  0
## 5 __Pseudarthrobacter phenanthrenivorans  0  0  0  0  0  0  0  0 0.00e+00  0  0
## 6                  __Rothia mucilaginosa  0  0  0  0  0  0  0  0 8.08e-05  0  0
##   L4 L5          L6 L7 L8 H1 H2 H3 H4 H5 H6          H7 H8
## 1  0  0 0.000178126  0  0  0  0  0  0  0  0 0.000193767  0
## 2  0  0 0.000000000  0  0  0  0  0  0  0  0 0.000096900  0
## 3  0  0 0.000000000  0  0  0  0  0  0  0  0 0.000096900  0
## 4  0  0 0.000000000  0  0  0  0  0  0  0  0 0.000096900  0
## 5  0  0 0.000000000  0  0  0  0  0  0  0  0 0.001614726  0
## 6  0  0 0.000000000  0  0  0  0  0  0  0  0 0.000000000  0

The taxa_table dataset will contain taxonomic infos about our Otu, metadata refers to informations about our samples, and sp_ratio will contain relative abundance of each Otu per samples. Next we load our libraries.

As you might see, we will use the igraph library to draw our networks, and the tidyverse will be used mainly for data wrangling and tidying.

Tidying dataset & first network plot

We will transform the data in a suitable form to facilitate network construction.

sp_ratio <- sp_ratio %>% 
  as_tibble() %>% 
  rename_all(tolower) %>% 
  mutate(species = str_replace_all(species, "__", "")) %>% 
  column_to_rownames(var = "species")

Next we use prevalence filtering to remove Otus that are not expressed in at least 13 samples, then we apply Spearman’s correlation on the data set to get a correlation matrix. Afterward, we define a threshold of correlation value to be represented in the network, otherwise we might end up with a network containing hundreds of nodes and edges (which will make it difficult to read and interpret). Next we create the network file and we remove nodes without edges. Finally we have a first look at our plot which is not really informative at this stage.

min.prevalence=13
incidence=sp_ratio
incidence[incidence>0]=1
sp_ratio_filtered <- sp_ratio[which(rowSums(incidence)>=min.prevalence),] ### end of prevalence filtering

sp_correl <- sp_ratio_filtered %>% 
  t() %>% 
  cor(method = "spearman")  ### correlation calculations

sp_correl[abs(sp_correl)<0.65]=0  ### define threshold for correlations

net_work <- graph_from_adjacency_matrix(sp_correl,mode="lower",weighted=TRUE, diag=FALSE)  ## create network file
net_work <- delete.vertices(net_work,degree(net_work)==0) #remove nodes without edges

plot(net_work, vertex.label = NA, edge.width = 5, vertex.size = 10) ## first plot 

Adding attributes to the network

Now we will enhance the network appearance by adding attributes to the nodes and edges. It will also make the plot much more readable and will help us during interpretations. Each major steps of the code will be explained.

taxa_tbl <- taxa_table %>% 
  as_tibble() %>% 
  mutate(species = str_replace_all(species, "__", "")) #tidy taxonomic infos

net_work_used <- V(net_work)$name %>% 
   as_tibble() %>% 
  mutate(species = value) %>% 
  select(species) #extract species represented in the network

v_attr <- sp_ratio %>% ### we create a table of attributes for nodes (vertex)
  rownames_to_column( var = "species") %>% 
  as_tibble() %>% 
  pivot_longer(-species, names_to = "sample_id", values_to = "ratio" ) %>% 
  mutate(species = str_replace_all(species, "__", "")) %>% 
  group_by(species) %>% 
  summarise(rel_abundance = sum(ratio)) %>% 
  inner_join(net_work_used, by = "species") %>% 
  inner_join(taxa_tbl, by = "species") %>%  
  mutate(rel_abundance = abs(exp (rel_abundance))) # we join taxonomic infos, relative abundance and species that were only represented in the network to create the attribute table

  
network_table <- igraph::as_data_frame(net_work, 'both') ##we convert the network in data frames
     
network_table$vertices <- network_table$vertices %>%
    as_tibble() %>% 
    inner_join(v_attr, by = c("name"="species"))  # we add our attribute in the data frames 
  
  net_work1 <- graph_from_data_frame(network_table$edges,
                                     directed = F,
                                     vertices = network_table$vertices) # we convert back data frames to a network

Plotting a network with edges coloured by class

Now we will use the attribute class to color each edges of the network.

mom_data <- V(net_work1)$class %>% 
  as_tibble_col(column_name = "species") #formating the class variable as factor (is needed for coloring edges)
mom_data$species <- as_factor(mom_data$species)

color_easy <-  c("pink", "brown", "orange", "dodgerblue", "lightgray")[mom_data$species] #creating a color palette to represent each class levels

V(net_work1)$color <-  color_easy ## we have now the color attributes based on class

E(net_work1)$sign <- E(net_work1)$weight ##we create another attribute from the weight

E(net_work1)$weight <- abs(E(net_work1)$weight) ## we then use absolute value of weight because of the specific layout we will use

### details about the network plot (with class attribute used for coloring vertex)

plot(net_work1, vertex.size= 8, #size of nodes
     edge.width=abs(E(net_work1)$weight)*4, #width of edges (edge is a segment between nodes)
     vertex.label = NA, #remove microbes names
     edge.color=ifelse(E(net_work1)$sign > 0, "blue","red"), # color edges based on weight sign
     layout=layout_with_kk(net_work1)) #specific layout

title_legend1 <- V(net_work1)$class %>% 
  as_factor() %>% 
  levels() #extracting class levels in an object for the legend

legend(x = 0.78, y = 1, title_legend1,
       pch = 21, pt.bg = c("pink", "brown", "orange", "dodgerblue", "lightgray"), 
       pt.cex = 2.5, bty = "n", ncol = 1) #adding legend

Plotting a network (basic)

We will plot the basic network that will have the name of the vertex included.

png("netnames.png", units="in", width=13, height=13, res=1000)

plot(net_work1, vertex.size= 8, edge.width=abs(E(net_work1)$weight)*4,
      vertex.label.cex = 0.8, vertex.color = "gold", vertex.label.color = "black",
     vertex.label.font = 3, vertex.label.dist = 1.5,
     edge.color=ifelse(E(net_work1)$sign > 0, "blue","red"),
     layout=layout_with_kk(net_work1))

dev.off()
## png 
##   2

Here we repeat the same code chunk to show you the final result. Since the previous one will save the plot in your working directory.

plot(net_work1, vertex.size= 8, edge.width=abs(E(net_work1)$weight)*4,
      vertex.label.cex = 0.8, vertex.color = "gold", vertex.label.color = "black",
     vertex.label.font = 3, vertex.label.dist = 1.5,
     edge.color=ifelse(E(net_work1)$sign > 0, "blue","red"),
     layout=layout_with_kk(net_work1))

Plotting a network with Phylum as attribute

We will repeat a similar process to what we used for using class as an attribute to color nodes.

net_work2 <- graph_from_data_frame(network_table$edges,
                                     directed = F,
                                     vertices = network_table$vertices) # we convert back data frames to a network

mom_data_2 <- V(net_work2)$phylum %>% 
  as_tibble_col(column_name = "species") #formating the class variable as factor (is needed for coloring edges)
mom_data_2$species <- as_factor(mom_data_2$species)

color_easy_2 <-  c("dodgerblue", "lightgray")[mom_data_2$species] #creating a color palette to represent each class levels

V(net_work2)$color <-  color_easy_2 ## we have now the color attributes based on class

E(net_work2)$sign <- E(net_work2)$weight ##we create another attribute from the weight

E(net_work2)$weight <- abs(E(net_work2)$weight) ## we then use absolute value of weight because of the specific layout we will use

### details about the network plot (with class attribute used for coloring vertex)

plot(net_work2, vertex.size= 8, #size of nodes
     edge.width=abs(E(net_work2)$weight)*4, #width of edges (edge is a segment between nodes)
     vertex.label = NA, #remove microbes names
     edge.color=ifelse(E(net_work2)$sign > 0, "blue","red"), # color edges based on weight sign
     layout=layout_with_kk(net_work2)) #specific layout

title_legend2 <- V(net_work2)$phylum %>% 
  as_factor() %>% 
  levels() #extracting class levels in an object for the legend

legend(x = 0.75, y = 1, title_legend2,
       pch = 21, pt.bg = c("dodgerblue", "lightgray"), 
       pt.cex = 2.5, bty = "n", ncol = 1) #adding legend

Plotting two networks together in one layout

Since the plots obtained from igraph are not ggplots like it is impossible to use the “patchwork” package for combining plots. Thus we will use the function in base R for doing so.

png("netcombined.png", units="in", width=16, height=8, res=1000)

par(mfrow=c(1,2), mar=c(0,0,0,0))

plot(net_work1, vertex.size= 8, #size of nodes
     edge.width=abs(E(net_work1)$weight)*4, #width of edges (edge is a segment between nodes)
     vertex.label = NA, #remove microbes names
     edge.color=ifelse(E(net_work2)$sign > 0, "blue","red"), # color edges based on weight sign
     layout=layout_with_kk(net_work1)) #specific layout

legend(x = 0.75, y = 1, title_legend1,
       pch = 21, pt.bg = c("pink", "brown", "orange", "dodgerblue", "lightgray"), 
       pt.cex = 2.5, bty = "n", ncol = 1) #adding legend

plot(net_work2, vertex.size= 8, #size of nodes
     edge.width=abs(E(net_work2)$weight)*4, #width of edges (edge is a segment between nodes)
     vertex.label = NA, #remove microbes names
     edge.color=ifelse(E(net_work2)$sign > 0, "blue","red"), # color edges based on weight sign
     layout=layout_with_kk(net_work2)) #specific layout

legend(x = 0.72, y = 1, title_legend2,
       pch = 21, pt.bg = c("dodgerblue", "lightgray"), 
       pt.cex = 2.5, bty = "n", ncol = 1) #adding legend

dev.off()
## png 
##   2

Here we repeat the same code chunk to show you the final result. Since the previous one will save the plot in your working directory.

par(mfrow=c(1,2), mar=c(0,0,0,0))

plot(net_work1, vertex.size= 8, #size of nodes
     edge.width=abs(E(net_work1)$weight)*4, #width of edges (edge is a segment between nodes)
     vertex.label = NA, #remove microbes names
     edge.color=ifelse(E(net_work2)$sign > 0, "blue","red"), # color edges based on weight sign
     layout=layout_with_kk(net_work1)) #specific layout

legend(x = 0.75, y = 1, title_legend1,
       pch = 21, pt.bg = c("pink", "brown", "orange", "dodgerblue", "lightgray"), cex = 1, 
       pt.cex = 1.5, bty = "n", ncol = 1) #adding legend

plot(net_work2, vertex.size= 8, #size of nodes
     edge.width=abs(E(net_work2)$weight)*4, #width of edges (edge is a segment between nodes)
     vertex.label = NA, #remove microbes names
     edge.color=ifelse(E(net_work2)$sign > 0, "blue","red"), # color edges based on weight sign
     layout=layout_with_kk(net_work2)) #specific layout

legend(x = 0.72, y = 1, title_legend2,
       pch = 21, pt.bg = c("dodgerblue", "lightgray"), cex = 1, 
       pt.cex = 1.5, bty = "n", ncol = 1) #adding legend