Physicochemical and biological data from anaerobic processes, which underwent 16S rRNA gene sequencing in public SRA repositories of the NCBI, were analyzed to propose potential microbial indicators. The statistical analyses involved categorizing CH₄ yield based on histogram bins, which were then used for further analyses, including diversity analysis, significance test, differential analysis, Venn diagram construction, and correlation analysis. These methods allowed for the identification of significant changes in physicochemical parameters and microbial data, ultimately leading to the inference of potential microbial indicators. This RPub presents the code framework that guided the decision-making process for proposing microbial indicators in anaerobic digesters fed with FW/OFMSW.
For further details, refer to the article “Identifying reliable microbial indicators in anaerobic digestion of organic solid waste: Insights from a meta-analysis”
The statistical framework was divided into two parts: -Part 1: Determination of methane yield performance categories to evaluate OFMSW/FW digesters in terms of physicochemical response and taxonomic diversity. -Part 2: Identification of microbial indicators to assess the performance of OFMSW/FW digesters.
CH4 yield (measured in mL CH4/gVSadded) is commonly used as a process indicator; thus, it was selected as both a categorical and quantitative variable for classifying AD performance. This classification, known as the methane yield category, was derived from the classes of a beta distribution histogram following the Freedman-Diaconis rule.
Freedman-Diaconis rule equation \[ \text{Class width} = 2 \quad \frac{\text{IQR}}{\sqrt[3]{n}} \] where IQR is the interquartile range of the data, and n is the number of observations.
# Load packages
library(readr)
library(here)
library(DT)
library(grDevices)
library(skimr)
library(janitor)
library(fitdistrplus)
library(fdth)
library(ggplot2)# Load Physicochemical data from meta-analysis
df <- read_csv(here("data", "meta-analysis-physicochemical-data.csv"),
locale = locale(encoding = "latin1"), show_col_types = FALSE) %>%
as.data.frame()# Data cleaning
# Replace "ND" with NA in all character-type columns
df[] <- lapply(df, function(x) {
if (is.character(x)) {
x[x == "ND"] <- NA
}
return(x)
})
# Automatically convert columns class (character, numeric) according to the values of their cells and clean names
df <- df %>%
mutate_all(~ type.convert(as.character(.), as.is = TRUE)) %>%
clean_names()
# Create a "pr_ac" and "tan_acetic" ratio variable
df <- df %>%
mutate(pr_ac = acetic_acid_mg_l / propionic_acid_mg_l,
pr_ac = ifelse(is.finite(pr_ac), pr_ac, NA),
tan_acetic = acetic_acid_mg_l / total_ammonia_nitrogen_mg_l,
tan_acetic = ifelse(is.finite(tan_acetic), tan_acetic, NA))
# Save object df as RDS object for upstreams analysis (part 2)
saveRDS(df, file = here("rds","df_physicochemical_data_metanalysis.rds"))# Table with physicochemical parameters
# The following table displays the information collected from multiple FW/OFMSW anaerobic processes.
datatable(df)| Name | df |
| Number of rows | 178 |
| Number of columns | 39 |
| _______________________ | |
| Column type frequency: | |
| character | 16 |
| numeric | 23 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| sample_id | 0 | 1.00 | 4 | 6 | 0 | 178 | 0 |
| label_study | 0 | 1.00 | 3 | 66 | 0 | 138 | 0 |
| run_accesion | 0 | 1.00 | 10 | 11 | 0 | 178 | 0 |
| feedstock | 0 | 1.00 | 10 | 74 | 0 | 8 | 0 |
| reactor_type | 45 | 0.75 | 3 | 9 | 0 | 6 | 0 |
| scale_up | 0 | 1.00 | 9 | 11 | 0 | 3 | 0 |
| feeding_regime | 0 | 1.00 | 5 | 15 | 0 | 3 | 0 |
| digestion_process | 0 | 1.00 | 17 | 20 | 0 | 2 | 0 |
| inoculum_source | 0 | 1.00 | 18 | 47 | 0 | 7 | 0 |
| process_status | 127 | 0.29 | 6 | 12 | 0 | 6 | 0 |
| freedman_class | 80 | 0.55 | 2 | 2 | 0 | 6 | 0 |
| group | 0 | 1.00 | 7 | 16 | 0 | 2 | 0 |
| primers | 0 | 1.00 | 65 | 81 | 0 | 54 | 0 |
| region_of_16s_r_rna | 0 | 1.00 | 2 | 5 | 0 | 3 | 0 |
| reference | 0 | 1.00 | 14 | 26 | 0 | 12 | 0 |
| doi | 0 | 1.00 | 41 | 47 | 0 | 12 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| temperature_c | 0 | 1.00 | 39.61 | 6.68 | 35.0 | 37.00 | 37.0 | 37.00 | 55.00 | ▇▁▁▁▂ |
| organic_loading_rate_g_vs_l_d | 106 | 0.40 | 4.00 | 3.98 | 0.4 | 3.00 | 3.0 | 3.00 | 18.90 | ▇▁▁▁▁ |
| hrt_d | 125 | 0.30 | 11.23 | 13.18 | 3.0 | 10.00 | 10.0 | 10.00 | 100.00 | ▇▁▁▁▁ |
| total_ammonia_nitrogen_mg_l | 51 | 0.71 | 3434.70 | 1839.11 | 532.8 | 1970.00 | 3000.0 | 4320.00 | 7500.00 | ▅▇▅▃▂ |
| free_ammonia_nitrogen_mg_l | 150 | 0.16 | 100.91 | 50.76 | 24.9 | 80.00 | 100.0 | 137.00 | 186.00 | ▆▅▇▂▇ |
| volatile_fatty_acid_mg_l | 146 | 0.18 | 3119.16 | 3033.15 | 487.0 | 806.00 | 1652.0 | 4388.25 | 9910.00 | ▇▂▂▁▂ |
| acetic_acid_mg_l | 65 | 0.63 | 827.80 | 928.63 | 0.0 | 91.00 | 451.0 | 1333.00 | 3266.00 | ▇▂▂▁▁ |
| propionic_acid_mg_l | 65 | 0.63 | 767.70 | 1417.16 | 0.0 | 40.00 | 195.0 | 439.00 | 5768.00 | ▇▁▁▁▁ |
| butyric_acid_mg_l | 98 | 0.45 | 137.06 | 191.14 | 0.0 | 8.75 | 64.0 | 221.00 | 1035.00 | ▇▂▁▁▁ |
| valeric_acid_mg_l | 123 | 0.31 | 578.21 | 683.26 | 1.0 | 90.40 | 255.0 | 813.50 | 2000.00 | ▇▂▁▂▂ |
| caproic_acid_mg_l | 171 | 0.04 | 296.14 | 332.23 | 1.0 | 1.00 | 250.0 | 499.50 | 821.00 | ▇▂▂▂▂ |
| isobutyric_acid_mg_l | 129 | 0.28 | 204.84 | 189.05 | 0.0 | 91.00 | 96.0 | 416.00 | 639.00 | ▇▂▁▂▁ |
| isovaleric_acid_mg_l | 139 | 0.22 | 217.21 | 241.20 | 0.0 | 1.00 | 101.0 | 418.00 | 666.00 | ▇▂▁▂▂ |
| formic_acid_mg_l | 172 | 0.03 | 10.00 | 0.00 | 10.0 | 10.00 | 10.0 | 10.00 | 10.00 | ▁▁▇▁▁ |
| myristic_acid_mg_l | 148 | 0.17 | 471.30 | 276.63 | 105.0 | 220.00 | 375.0 | 729.00 | 916.00 | ▆▇▁▅▅ |
| palmitic_acid_mg_l | 148 | 0.17 | 4977.00 | 3667.00 | 355.0 | 1568.75 | 3357.0 | 8618.00 | 10847.00 | ▇▇▁▅▅ |
| stearic_acid_mg_l | 148 | 0.17 | 2889.33 | 1786.59 | 65.0 | 1034.00 | 2703.5 | 4803.00 | 5252.00 | ▆▂▃▁▇ |
| oleic_acid_mg_l | 148 | 0.17 | 2664.57 | 1474.35 | 39.0 | 1232.00 | 2257.5 | 4019.00 | 4756.00 | ▁▅▆▁▇ |
| linoleic_acid_mg_l | 148 | 0.17 | 655.70 | 218.75 | 131.0 | 552.00 | 711.5 | 778.00 | 923.00 | ▁▂▂▆▇ |
| sulfate_mg_l | 176 | 0.01 | 118.50 | 6.36 | 114.0 | 116.25 | 118.5 | 120.75 | 123.00 | ▇▁▁▁▇ |
| ch4_yield_m_lch4_g_v_sadded | 80 | 0.55 | 370.21 | 187.67 | 0.0 | 237.00 | 428.0 | 533.25 | 684.00 | ▃▅▃▇▅ |
| pr_ac | 81 | 0.54 | 8.40 | 31.45 | 0.0 | 0.81 | 2.2 | 4.00 | 182.50 | ▇▁▁▁▁ |
| tan_acetic | 96 | 0.46 | 0.32 | 0.30 | 0.0 | 0.06 | 0.2 | 0.44 | 0.97 | ▇▇▂▁▃ |
# Select the CH4 yield column considering it as the variable of interest that will define the performance categories
# CH4 yield data frame
CH4_yield_df <- df %>%
dplyr::filter(!is.na(ch4_yield_m_lch4_g_v_sadded) & ch4_yield_m_lch4_g_v_sadded != "") %>%
rename(YCH4_mLgVS = ch4_yield_m_lch4_g_v_sadded)
# Save object CH4_yield_df as RDS object for upstreams analysis (part 2)
saveRDS(CH4_yield_df, file = here("rds","CH4_yield_df.rds"))# Normality test - Shapiro-Wilk test If p > 0.05, the data follow a normal distribution
shapiro.test(CH4_yield_df$YCH4_mLgVS) # p < 0.05##
## Shapiro-Wilk normality test
##
## data: CH4_yield_df$YCH4_mLgVS
## W = 0.91781, p-value = 1.301e-05
# Since p < 0.05 the data do not follow a normal distribution. The results indicate a beta distribution.
descdist(CH4_yield_df$YCH4_mLgVS, discrete = FALSE, boot = 1000) ## summary statistics
## ------
## min: 0 max: 684
## median: 428
## mean: 370.2112
## estimated sd: 187.6712
## estimated skewness: -0.5623359
## estimated kurtosis: 2.233972
# See adjustments. Compare normal distribution and beta distribution
fit.norm <- fitdist(CH4_yield_df$YCH4_mLgVS, "norm")
# Plot the data distribution
plot(fit.norm)## 0% 25% 50% 75% 100%
## 0.00 237.00 428.00 533.25 684.00
# Define intervals using the Freedman–Diaconis rule
interval_FD <- nclass.FD(CH4_yield_df$YCH4_mLgVS)
# Store the CH4 yield data in a vector
data_YCH4_mLgVS <- CH4_yield_df$YCH4_mLgVS
# Create a table with the CH4 yield intervals obtained
table.FD <- transform(table(cut(data_YCH4_mLgVS, breaks = interval_FD))) %>%
rename(YCH4_mLgVS_intervals = Var1) # Rename column
datatable(table.FD)# Extract intervals and values from "table.FD"
# Get the intervals from the "YCH4_mLgVS_intervals" column in "table.FD"
intervals <- table.FD$YCH4_mLgVS_intervals
# Extract the limits for each interval for the cut() function
breaks <- c(-Inf, 114, 228, 342, 456, 570, 685) # The cutoff values for each class
# Use cut() to categorize the values in "data_YCH4_mLgVS" according to the intervals
categories <- cut(data_YCH4_mLgVS, breaks = breaks, labels = c("C1", "C2", "C3", "C4", "C5", "C6"), include.lowest = TRUE)
# Create the dataframe with the yields and categories
CH4_yield_df_2 <- data.frame(
YCH4_mLgVS = data_YCH4_mLgVS,
Freedman = as.factor(categories))# Histogram with Freedman-Diaconis classes
my_bar_FD <- barplot(table.FD$Freq,
border = FALSE,
las = 2,
ylim = c(0,40),
ylab = "Frequency",
xlab = "Classes of CH4 yield in Food-waste/OFMSW anaerobic digesters",
names.arg = c("C1", "C2", "C3", "C4", "C5", "C6"),
cex.axis = 0.75,
cex.lab = 0.75,
cex.names = 0.75,
col = "darkred")
# Add separation line (Interval size = 1.2)
abline(v = c(1.3, 2.5, 3.7, 4.9, 6.1, 7.3, 8.5), col = "grey") # FD
text(my_bar_FD, table.FD$Freq + 1.4, paste("n: ", table.FD$Freq, sep = ""), cex = 0.5)Histogram showing the frequency distribution of CH4 yields grouped by class.
# Assign different colors to each class
colors <- c("#cc3300ff", "#ff9966ff", "#004cffff", "#ffcc00ff", "#99cc33ff", "#339900ff")
# Create the boxplot with different colors for each class
myplot_FD <- boxplot(YCH4_mLgVS ~ Freedman, data= CH4_yield_df_2, ylab="", xlab = "",
border = colors, # Assign colors to the boxes
col = colors, # Colors for the inside of the boxes
xaxt="n", # Remove ticks from the x-axis
yaxt = "n") # Remove ticks from the y-axis
# Y-axis
axis(2, ylim=c(0,1000), col="black", las=1, cex.axis = 0.75)
mtext("CH4 yield (mLCH4 gVS)", side=2, line=2.2, cex = 0.75)
# X-axis
my_names <- sapply(strsplit(myplot_FD$names, '\\.') , function(x) x[[1]])
my_names <- my_names[seq(1, length(my_names), 1)]
axis(1, at = seq(1, 6, 1), labels = my_names, tick=FALSE, cex.axis = 0.75)
mtext("Freedman-Diaconis Classes", side=1, line=2, cex = 0.75)
# Add grey vertical lines
for(i in seq(0.5 , 20 , 1)){
abline(v=i, lty=1, col="grey")
}Boxplot illustrating CH4 yield values grouped for each class.
The physicochemical parameters and methane yield categories were assessed using the Kruskal-Wallis test and Wilcoxon rank sum test with Benjamini-Hochberg correction for tests among and within methane yield categories respectively.
# Load packages
library(here)
library(dplyr)
library(reshape2)
library(ggplot2)
library(knitr)
library(kableExtra)
library(DT)
library(ggtext)# Select relevant physicochemical parameters
op_cond_wide <- CH4_yield_df %>%
dplyr::select("sample_id", "total_ammonia_nitrogen_mg_l", "free_ammonia_nitrogen_mg_l", "volatile_fatty_acid_mg_l",
"acetic_acid_mg_l", "propionic_acid_mg_l", "butyric_acid_mg_l", "valeric_acid_mg_l", "caproic_acid_mg_l",
"isobutyric_acid_mg_l", "isovaleric_acid_mg_l", "formic_acid_mg_l", "myristic_acid_mg_l",
"palmitic_acid_mg_l", "stearic_acid_mg_l", "oleic_acid_mg_l", "linoleic_acid_mg_l", "sulfate_mg_l",
"YCH4_mLgVS", "pr_ac", "tan_acetic", "freedman_class")
# Change dataframe format from wide format to long format
op_cond <- melt(op_cond_wide)
# Convert CH4 yield categories en factor class
freedman_levels <- c("C1","C2","C3","C4","C5","C6")
op_cond$freedman_class <- factor(op_cond$freedman_class, levels = freedman_levels)# Select physicochemical parameters
variables_interest <- c("volatile_fatty_acid_mg_l","acetic_acid_mg_l","butyric_acid_mg_l",
"propionic_acid_mg_l","isobutyric_acid_mg_l","isovaleric_acid_mg_l",
"valeric_acid_mg_l","total_ammonia_nitrogen_mg_l","free_ammonia_nitrogen_mg_l",
"linoleic_acid_mg_l","myristic_acid_mg_l", "oleic_acid_mg_l",
"palmitic_acid_mg_l","stearic_acid_mg_l","pr_ac", "tan_acetic")# Create an empty list to store the results of significance tests and figures.
results <- list()
# Iteration over each variable
for (var in variables_interest) {
phys_data <- op_cond %>%
dplyr::filter(variable == var) %>%
dplyr::select(variable, freedman_class, value) %>%
stats::na.omit()
# Kruskal-Wallis Test
kruskal <- kruskal.test(value ~ freedman_class, data = phys_data)
# Wilcoxon test with BH correction
wilcoxon <- pairwise.wilcox.test(phys_data$value, phys_data$freedman_class, p.adjust.method = "BH")
# Box-and-whisker plot
# Assign fixed colors to each CH4 yield categories
fixed_colors <- c("#cc3300", "#ff9966", "#004cff", "#ffcc00", "#99cc33", "#339900")
p <- ggplot(phys_data, aes(x = freedman_class, y = value, color = freedman_class)) +
geom_boxplot(outlier.shape = NA, outlier.size = 0.5, notch = FALSE) +
geom_jitter(width = 0.2) +
scale_color_manual(values = setNames(fixed_colors, freedman_levels), drop = FALSE) +
theme_bw() +
labs(title = var) +
theme(plot.title = element_markdown(size = 10, face = "bold"))
# Save results in a list
results[[var]] <- list(
kruskal = kruskal,
wilcoxon = wilcoxon,
figure = p
)
}📈 Statistical analysis for volatile_fatty_acid_mg_l
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 16.752, df = 3, p-value = 0.0007946
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: phys_data$value and phys_data$freedman_class
C2 C3 C4
C3 0.072 - -
C4 0.090 0.038 -
C5 0.064 0.010 0.064
P value adjustment method: BH
📈 Statistical analysis for acetic_acid_mg_l
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 28.131, df = 5, p-value = 3.432e-05
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: phys_data$value and phys_data$freedman_class
C1 C2 C3 C4 C5
C2 0.2152 - - - -
C3 0.2381 0.7912 - - -
C4 0.7912 0.1329 0.0648 - -
C5 0.7912 0.0096 0.0077 0.7757 -
C6 0.1329 0.0077 0.0020 0.0094 0.0020
P value adjustment method: BH
📈 Statistical analysis for butyric_acid_mg_l
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 11.759, df = 5, p-value = 0.03824
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: phys_data$value and phys_data$freedman_class
C1 C2 C3 C4 C5
C2 0.90 - - - -
C3 1.00 0.32 - - -
C4 0.45 0.32 0.27 - -
C5 0.32 0.32 0.27 0.32 -
C6 0.77 0.45 0.32 0.32 0.32
P value adjustment method: BH
📈 Statistical analysis for propionic_acid_mg_l
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 3.748, df = 5, p-value = 0.5862
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: phys_data$value and phys_data$freedman_class
C1 C2 C3 C4 C5
C2 1.00 - - - -
C3 1.00 1.00 - - -
C4 1.00 1.00 0.46 - -
C5 1.00 1.00 0.46 1.00 -
C6 1.00 1.00 0.46 1.00 1.00
P value adjustment method: BH
📈 Statistical analysis for isobutyric_acid_mg_l
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 22.894, df = 3, p-value = 4.25e-05
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: phys_data$value and phys_data$freedman_class
C2 C3 C4
C3 0.0484 - -
C4 0.0104 0.0270 -
C5 0.0051 0.0035 0.0270
P value adjustment method: BH
📈 Statistical analysis for isovaleric_acid_mg_l
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 6.2312, df = 3, p-value = 0.1009
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: phys_data$value and phys_data$freedman_class
C2 C3 C4
C3 0.047 - -
C4 0.245 0.425 -
C5 0.245 0.245 0.245
P value adjustment method: BH
📈 Statistical analysis for valeric_acid_mg_l
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 12.421, df = 5, p-value = 0.02945
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: phys_data$value and phys_data$freedman_class
C1 C2 C3 C4 C5
C2 0.598 - - - -
C3 0.598 0.757 - - -
C4 0.979 0.598 0.220 - -
C5 0.979 0.504 0.055 0.220 -
C6 1.000 0.745 0.504 0.598 0.598
P value adjustment method: BH
📈 Statistical analysis for total_ammonia_nitrogen_mg_l
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 23.358, df = 5, p-value = 0.0002884
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: phys_data$value and phys_data$freedman_class
C1 C2 C3 C4 C5
C2 0.434 - - - -
C3 0.031 0.013 - - -
C4 0.031 0.040 0.753 - -
C5 0.021 0.013 0.544 0.620 -
C6 0.013 0.013 0.013 0.017 0.101
P value adjustment method: BH
📈 Statistical analysis for free_ammonia_nitrogen_mg_l
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 3.9648, df = 4, p-value = 0.4108
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: phys_data$value and phys_data$freedman_class
C1 C2 C3 C4
C2 0.80 - - -
C3 0.80 0.65 - -
C4 0.65 1.00 1.00 -
C5 0.65 1.00 0.87 0.80
P value adjustment method: BH
📈 Statistical analysis for linoleic_acid_mg_l
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 9.4066, df = 4, p-value = 0.0517
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: phys_data$value and phys_data$freedman_class
C2 C3 C4 C5
C3 0.089 - - -
C4 0.077 0.918 - -
C5 0.077 1.000 0.918 -
C6 0.077 0.834 0.366 0.366
P value adjustment method: BH
📈 Statistical analysis for myristic_acid_mg_l
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 5.1354, df = 4, p-value = 0.2737
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: phys_data$value and phys_data$freedman_class
C2 C3 C4 C5
C3 0.18 - - -
C4 0.47 1.00 - -
C5 0.67 1.00 0.67 -
C6 0.18 1.00 0.67 0.67
P value adjustment method: BH
📈 Statistical analysis for oleic_acid_mg_l
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 6.939, df = 4, p-value = 0.1391
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: phys_data$value and phys_data$freedman_class
C2 C3 C4 C5
C3 0.18 - - -
C4 0.35 0.51 - -
C5 0.51 0.51 0.51 -
C6 0.18 0.21 0.51 0.51
P value adjustment method: BH
📈 Statistical analysis for palmitic_acid_mg_l
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 5.1354, df = 4, p-value = 0.2737
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: phys_data$value and phys_data$freedman_class
C2 C3 C4 C5
C3 0.18 - - -
C4 0.47 1.00 - -
C5 0.67 1.00 0.67 -
C6 0.18 1.00 0.67 0.67
P value adjustment method: BH
📈 Statistical analysis for stearic_acid_mg_l
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 10.386, df = 4, p-value = 0.03441
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: phys_data$value and phys_data$freedman_class
C2 C3 C4 C5
C3 0.089 - - -
C4 0.077 0.511 - -
C5 0.077 0.511 0.511 -
C6 0.077 0.125 0.511 0.511
P value adjustment method: BH
📈 Statistical analysis for pr_ac
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 13.181, df = 5, p-value = 0.02174
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: phys_data$value and phys_data$freedman_class
C1 C2 C3 C4 C5
C2 0.809 - - - -
C3 0.809 0.809 - - -
C4 0.809 0.907 0.809 - -
C5 0.907 0.809 0.911 0.809 -
C6 0.233 0.029 0.020 0.020 0.014
P value adjustment method: BH
📈 Statistical analysis for tan_acetic
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 27.325, df = 4, p-value = 1.709e-05
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: phys_data$value and phys_data$freedman_class
C2 C3 C4 C5
C3 0.5287 - - -
C4 0.0257 0.0124 - -
C5 0.0080 0.0059 0.5287 -
C6 0.0059 0.0037 0.0080 0.0037
P value adjustment method: BH
Random subsampling without replacement was conducted using the first quartile frequency to rarefy sequencing depth across all samples. This approach guaranteed that richness estimates remained unaffected by variations in sequencing depth.
# Load database
# non-rarefied sOTUs count data from meta-analysis
df_sotus <- read_xlsx(here("data", "meta-analysis-microbial-data.xlsx"), sheet = "otus_abs_count") %>%
column_to_rownames("otu") # Data with bacteria-archaea characterization
df_genera <- read_xlsx(here("data", "meta-analysis-microbial-data.xlsx"), sheet = "genera_abs_count") %>%
column_to_rownames("genera") %>%
dplyr::select(-"F25_7", -"F17_5", -"F17_3", -"F17_6") # Samples without observed archaea# Rarefaction curves of the all samples
# OTU
rarecurve(t(df_sotus), step=50, cex=0.5,
xlab = "Sample Size", ylab = "sOTUs", col = "blue")# Genera
rarecurve(t(df_sotus), step=50, cex=0.5,
xlab = "Sample Size", ylab = "sOTUs", col = "blue")# Obtain the total sum of observed sOTUs/genera for each sample
sample_sums_sotus <- rowSums(t(df_sotus))
sample_sums_genera <- rowSums(t(df_genera))
# Quantiles
q_results <- list(
sOTUs = quantile(sample_sums_sotus),
Genera = quantile(sample_sums_genera)
)
print(q_results)## $sOTUs
## 0% 25% 50% 75% 100%
## 4957 11542 16980 23871 97920
##
## $Genera
## 0% 25% 50% 75% 100%
## 4957.00 11559.25 17021.00 23606.75 97920.00
# Create phyloseq object with data count
ps_otus <- phyloseq(otu_table(df_sotus, taxa_are_rows = TRUE))
ps_genera <- phyloseq(otu_table(df_genera, taxa_are_rows = TRUE))
# Rarefaction using the sampling method without replacement. The first quartile of the sOTUs and genera count found in the samples, which were 11542 and 11559 seqs respectively.
data_raref_otus <- rarefy_even_depth(ps_otus, rngseed = 12345, sample.size=11542, replace = F)
data_raref_genera <- rarefy_even_depth(ps_genera, rngseed = 12345, sample.size=11559, replace = F)
# Convert the sOTUs and genera matrix into a data frame. These data were used to alpha diveristy.
df_raref_otus <- as.data.frame(data_raref_otus)
df_raref_genera <- as.data.frame(data_raref_genera)# Rarefaction Curve Plots - sOTUs
rarecurve(t(df_raref_otus), step=50, cex=0.5,
xlab = "Sample Size", ylab = "sOTUs count", col = "blue")# Rarefaction Curve Plots - Genera
rarecurve(t(df_raref_genera), step=50, cex=0.5,
xlab = "Sample Size", ylab = "Genera count", col = "blue") Diversity analyses were performed and compared using a rarefied count table at sOTU level and genus level. Alpha diversity was grouped from methane yield categories and assessed using multiple indices, including Chao1, Pielou’s evenness, Shannon, and Hill numbers (q0, q1, and q2). The interaction among these categories and alpha diversity was evaluated using Kruskal-Wallis and Wilcoxon rank sum tests with Benjamini-Hochberg correction
# Load packages
library(phyloseq)
library(vegan)
library(hilldiv)
library(reshape2)
library(ggplot2)
library(dplyr)
library(tidyr)
library(here)
library(tibble)
library(ggpubr)
library(forcats)# Create phyloseq object with rarefied data count
ps_rare_sotus <- phyloseq(otu_table(df_raref_otus, taxa_are_rows = TRUE))
ps_rare_genera <- phyloseq(otu_table(df_raref_genera, taxa_are_rows = TRUE))
# Load database
CH4_yield_df <- readRDS(file = here("rds","CH4_yield_df.rds"))# sOTUs count
obs <- estimate_richness(ps_rare_sotus, split = TRUE, measure = "Observed")
# Chao1
chao1 <- estimate_richness(ps_rare_sotus, split = TRUE, measure = "Chao1") %>% dplyr::select("Chao1")
# Ace
ace <- estimate_richness(ps_rare_sotus, split = TRUE, measure = "ACE") %>% dplyr::select("ACE")
# Shannon
shannon <- estimate_richness(ps_rare_sotus, split = TRUE, measure = "Shannon")
# Simpson
Simpson <- estimate_richness(ps_rare_sotus, split = TRUE, measure = "Simpson")
# InvSimpson
InvSimpson <- estimate_richness(ps_rare_sotus, split = TRUE, measure = "InvSimpson")
# Fisher
Fisher <- estimate_richness(ps_rare_sotus, split = TRUE, measure = "Fisher")
# Pielou
pielou <- shannon/log(obs)
colnames(pielou) <- c("pielou")
# q0 - Species richness
q0 <- hill_div(df_raref_otus, qvalue = 0) %>% as.data.frame()
colnames(q0) <- "q0"
#q1 - Shannon diversity
q1 <- hill_div(df_raref_otus, qvalue = 1) %>% as.data.frame()
colnames(q1) <- "q1"
# q2 - Simpson diversity
q2 <- hill_div(df_raref_otus, qvalue = 2) %>% as.data.frame()
colnames(q2) <- "q2"# Create a data frame with alpha diversity indices
df_alpha_div <- cbind(
obs,chao1,ace,shannon,Simpson,InvSimpson,
Fisher,pielou,q0,q1,q2)
# Save the file in .rds format for use in upstream applications in RMarkdown Part 2
saveRDS(df_alpha_div, file = here("rds","df_alpha_div_sotus.rds"))
# # Convert row names of the alpha diversity dataframe to a column named 'sample_id',
# reshape the dataframe to long format, merge with methane yield classification data
# based on 'sample_id', and convert 'variable' to character and 'freedman_class' to factor.
df_alpha_div <- df_alpha_div %>%
rownames_to_column(., var = "sample_id") %>%
melt() %>%
left_join(CH4_yield_df %>% dplyr::select(sample_id, freedman_class), by = "sample_id") %>%
mutate(variable = as.character(variable),
freedman_class = as.factor(freedman_class))# Convert CH4 yield categories into a factor class
freedman_levels <- c("C1","C2","C3","C4","C5","C6")
# Convert freedman_class into a factor with defined levels
df_alpha_div$freedman_class <- factor(df_alpha_div$freedman_class, levels = freedman_levels)# Obtain unique names of diversity indices
sotus_div_names <- unique(df_alpha_div$variable)
# Create an empty list to store the results.
alpha_div_sotus <- list()
# Iterate over each diversity index
for (var in sotus_div_names) {
# Filter relevant data
sotus_data <- df_alpha_div %>%
dplyr::filter(variable == var) %>%
dplyr::select(variable, freedman_class, value) %>%
na.omit()
# Kruskal-Wallis hypothesis test
kruskal <- kruskal.test(value ~ freedman_class, data = sotus_data)
# Pairwise Wilcoxon hypothesis test
wilcoxon <- pairwise.wilcox.test(sotus_data$value, sotus_data$freedman_class,
p.adjust.method = "BH")
# Define fixed colors for each CH4 yield category.
fixed_colors <- c("#339900", "#004cff","#99cc33","#ff9966", "#ffcc00","#cc3300")
# Boxplot plot
p <- ggplot(sotus_data, aes(x = freedman_class, y = value, color = freedman_class)) +
geom_boxplot(outlier.shape = NA, notch = FALSE) +
geom_jitter(width = 0.2, size = 1) +
scale_color_manual(values = setNames(fixed_colors, unique(sotus_data$freedman_class)), drop = FALSE) +
theme_bw() +
labs(title = var) +
theme(plot.title = element_text(size = 10, face = "bold"))
# Save results in the list
alpha_div_sotus[[var]] <- list(
kruskal = kruskal,
wilcoxon = wilcoxon,
figure = p
)
}📈 Statistical analysis for Observed
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 9.5063, df = 5, p-value = 0.0905
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum exact test
data: sotus_data$value and sotus_data$freedman_class
C1 C2 C3 C4 C5
C2 0.604 - - - -
C3 0.678 1.000 - - -
C4 0.604 0.678 0.749 - -
C5 0.604 0.678 0.678 1.000 -
C6 0.027 0.028 0.118 0.313 0.209
P value adjustment method: BH
📈 Statistical analysis for Chao1
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 10.273, df = 5, p-value = 0.06785
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum exact test
data: sotus_data$value and sotus_data$freedman_class
C1 C2 C3 C4 C5
C2 0.573 - - - -
C3 0.674 0.972 - - -
C4 0.573 0.681 0.674 - -
C5 0.237 0.704 0.674 0.972 -
C6 0.018 0.105 0.118 0.377 0.204
P value adjustment method: BH
📈 Statistical analysis for ACE
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 10.417, df = 5, p-value = 0.06425
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum exact test
data: sotus_data$value and sotus_data$freedman_class
C1 C2 C3 C4 C5
C2 0.593 - - - -
C3 0.674 0.981 - - -
C4 0.593 0.692 0.674 - -
C5 0.220 0.692 0.674 1.000 -
C6 0.018 0.070 0.148 0.377 0.181
P value adjustment method: BH
📈 Statistical analysis for Shannon
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 7.7739, df = 5, p-value = 0.1691
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum exact test
data: sotus_data$value and sotus_data$freedman_class
C1 C2 C3 C4 C5
C2 0.910 - - - -
C3 0.803 0.834 - - -
C4 0.634 0.834 0.834 - -
C5 0.803 0.834 0.803 0.834 -
C6 0.027 0.077 0.077 0.077 0.803
P value adjustment method: BH
📈 Statistical analysis for Simpson
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 7.8512, df = 5, p-value = 0.1646
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum exact test
data: sotus_data$value and sotus_data$freedman_class
C1 C2 C3 C4 C5
C2 0.8977 - - - -
C3 0.8977 0.8762 - - -
C4 0.5688 0.8977 1.0000 - -
C5 0.8762 0.9771 0.8762 0.8977 -
C6 0.0024 0.2704 0.1021 0.1021 0.6653
P value adjustment method: BH
📈 Statistical analysis for InvSimpson
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 7.8512, df = 5, p-value = 0.1646
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum exact test
data: sotus_data$value and sotus_data$freedman_class
C1 C2 C3 C4 C5
C2 0.8977 - - - -
C3 0.8977 0.8762 - - -
C4 0.5688 0.8977 1.0000 - -
C5 0.8762 0.9771 0.8762 0.8977 -
C6 0.0024 0.2704 0.1021 0.1021 0.6653
P value adjustment method: BH
📈 Statistical analysis for Fisher
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 9.5063, df = 5, p-value = 0.0905
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum exact test
data: sotus_data$value and sotus_data$freedman_class
C1 C2 C3 C4 C5
C2 0.604 - - - -
C3 0.678 1.000 - - -
C4 0.604 0.678 0.749 - -
C5 0.604 0.678 0.678 1.000 -
C6 0.027 0.028 0.118 0.313 0.209
P value adjustment method: BH
📈 Statistical analysis for pielou
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 3.6183, df = 5, p-value = 0.6056
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum exact test
data: sotus_data$value and sotus_data$freedman_class
C1 C2 C3 C4 C5
C2 0.90 - - - -
C3 0.90 0.90 - - -
C4 0.90 0.90 0.90 - -
C5 0.90 0.98 0.90 0.90 -
C6 0.15 0.90 0.90 0.42 0.90
P value adjustment method: BH
📈 Statistical analysis for q0
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 9.5063, df = 5, p-value = 0.0905
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum exact test
data: sotus_data$value and sotus_data$freedman_class
C1 C2 C3 C4 C5
C2 0.604 - - - -
C3 0.678 1.000 - - -
C4 0.604 0.678 0.749 - -
C5 0.604 0.678 0.678 1.000 -
C6 0.027 0.028 0.118 0.313 0.209
P value adjustment method: BH
📈 Statistical analysis for q1
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 7.7739, df = 5, p-value = 0.1691
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum exact test
data: sotus_data$value and sotus_data$freedman_class
C1 C2 C3 C4 C5
C2 0.910 - - - -
C3 0.803 0.834 - - -
C4 0.634 0.834 0.834 - -
C5 0.803 0.834 0.803 0.834 -
C6 0.027 0.077 0.077 0.077 0.803
P value adjustment method: BH
📈 Statistical analysis for q2
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 7.8512, df = 5, p-value = 0.1646
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum exact test
data: sotus_data$value and sotus_data$freedman_class
C1 C2 C3 C4 C5
C2 0.8977 - - - -
C3 0.8977 0.8762 - - -
C4 0.5688 0.8977 1.0000 - -
C5 0.8762 0.9771 0.8762 0.8977 -
C6 0.0024 0.2704 0.1021 0.1021 0.6653
P value adjustment method: BH
Significant test and distribution of alpha diversity indices grouped by methane yield categories identified by different colors (C1-C6) at sOTU level.
# sOTUs count
obs <- estimate_richness(ps_rare_genera, split = TRUE, measure = "Observed")
# Chao1
chao1 <- estimate_richness(ps_rare_genera, split = TRUE, measure = "Chao1") %>% dplyr::select("Chao1")
# Ace
ace <- estimate_richness(ps_rare_genera, split = TRUE, measure = "ACE") %>% dplyr::select("ACE")
# Shannon
shannon <- estimate_richness(ps_rare_genera, split = TRUE, measure = "Shannon")
# Simpson
Simpson <- estimate_richness(ps_rare_genera, split = TRUE, measure = "Simpson")
# InvSimpson
InvSimpson <- estimate_richness(ps_rare_genera, split = TRUE, measure = "InvSimpson")
# Fisher
Fisher <- estimate_richness(ps_rare_genera, split = TRUE, measure = "Fisher")
# Pielou
pielou <- shannon/log(obs)
colnames(pielou) <- c("pielou")
# q0 - Species richness
q0 <- hill_div(df_raref_genera, qvalue = 0) %>% as.data.frame()
colnames(q0) <- "q0"
#q1 - Shannon diversity
q1 <- hill_div(df_raref_genera, qvalue = 1) %>% as.data.frame()
colnames(q1) <- "q1"
# q2 - Simpson diversity
q2 <- hill_div(df_raref_genera, qvalue = 2) %>% as.data.frame()
colnames(q2) <- "q2"# Create a data frame with alpha diversity indices
df_alpha_div <- cbind(
obs,chao1,ace,shannon,Simpson,InvSimpson,
Fisher,pielou,q0,q1,q2)
# Save the file in .rds format for use in upstream applications in RMarkdown Part 2
saveRDS(df_alpha_div, file = here("rds","df_alpha_div_genera.rds"))
# # Convert row names of the alpha diversity dataframe to a column named 'sample_id',
# reshape the dataframe to long format, merge with methane yield classification data
# based on 'sample_id', and convert 'variable' to character and 'freedman_class' to factor.
df_alpha_div <- df_alpha_div %>%
rownames_to_column(., var = "sample_id") %>%
melt() %>%
left_join(CH4_yield_df %>% dplyr::select(sample_id, freedman_class), by = "sample_id") %>%
mutate(variable = as.character(variable),
freedman_class = as.factor(freedman_class))# Convert CH4 yield categories en factor class
freedman_levels <- c("C1","C2","C3","C4","C5","C6")
# Convert freedman_class a factor with 6 levels
df_alpha_div$freedman_class <- factor(df_alpha_div$freedman_class, levels = freedman_levels)# Obtain unique names of diversity indices
genera_div_names <- unique(df_alpha_div$variable)
# Create an empty list to store the results.
alpha_div_genera <- list()
# Iterate over each variable.
for (var in genera_div_names) {
# Filter relevant data
genera_data <- df_alpha_div %>%
dplyr::filter(variable == var) %>%
dplyr::select(variable, freedman_class, value) %>%
na.omit()
# Kruskal-Wallis test
kruskal <- kruskal.test(value ~ freedman_class, data = genera_data)
# Pairwise Wilcoxon test
wilcoxon <- pairwise.wilcox.test(genera_data$value, genera_data$freedman_class,
p.adjust.method = "BH")
# Define fixed colors for each CH4 yield category.
fixed_colors <- c("#339900", "#004cff","#99cc33","#ff9966", "#ffcc00","#cc3300")
# Boxplot
p <- ggplot(genera_data, aes(x = freedman_class, y = value, color = freedman_class)) +
geom_boxplot(outlier.shape = NA, notch = FALSE) +
geom_jitter(width = 0.2, size = 1) +
scale_color_manual(values = setNames(fixed_colors, unique(genera_data$freedman_class)), drop = FALSE) +
theme_bw() +
labs(title = var) +
theme(plot.title = element_text(size = 10, face = "bold"))
# Save results in the list
alpha_div_genera[[var]] <- list(
kruskal = kruskal,
wilcoxon = wilcoxon,
figure = p
)
}📈 Statistical analysis for Observed
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 5.4171, df = 5, p-value = 0.3671
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: genera_data$value and genera_data$freedman_class
C1 C2 C3 C4 C5
C2 0.63 - - - -
C3 0.59 0.59 - - -
C4 0.59 0.59 0.87 - -
C5 0.59 0.65 0.59 0.59 -
C6 0.59 0.75 0.80 0.59 0.87
P value adjustment method: BH
📈 Statistical analysis for Chao1
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 6.0451, df = 5, p-value = 0.3019
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: genera_data$value and genera_data$freedman_class
C1 C2 C3 C4 C5
C2 0.87 - - - -
C3 0.54 0.54 - - -
C4 0.54 0.54 0.87 - -
C5 0.66 0.66 0.54 0.54 -
C6 0.66 0.66 0.66 0.62 0.69
P value adjustment method: BH
📈 Statistical analysis for ACE
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 5.1781, df = 5, p-value = 0.3945
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: genera_data$value and genera_data$freedman_class
C1 C2 C3 C4 C5
C2 0.88 - - - -
C3 0.55 0.55 - - -
C4 0.55 0.55 0.89 - -
C5 0.73 0.73 0.55 0.55 -
C6 0.82 0.73 0.73 0.73 0.82
P value adjustment method: BH
📈 Statistical analysis for Shannon
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 12.675, df = 5, p-value = 0.02662
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum exact test
data: genera_data$value and genera_data$freedman_class
C1 C2 C3 C4 C5
C2 0.1721 - - - -
C3 0.1063 0.7657 - - -
C4 0.0622 0.7657 0.7657 - -
C5 0.0096 0.7657 0.5254 0.7657 -
C6 0.1063 0.7657 0.9385 0.9385 0.9385
P value adjustment method: BH
📈 Statistical analysis for Simpson
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 7.6303, df = 5, p-value = 0.1778
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum exact test
data: genera_data$value and genera_data$freedman_class
C1 C2 C3 C4 C5
C2 0.45 - - - -
C3 0.45 0.68 - - -
C4 0.45 0.64 0.64 - -
C5 0.26 0.75 0.45 0.64 -
C6 0.45 0.78 0.59 0.75 0.74
P value adjustment method: BH
📈 Statistical analysis for InvSimpson
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 7.6303, df = 5, p-value = 0.1778
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum exact test
data: genera_data$value and genera_data$freedman_class
C1 C2 C3 C4 C5
C2 0.45 - - - -
C3 0.45 0.68 - - -
C4 0.45 0.64 0.64 - -
C5 0.26 0.75 0.45 0.64 -
C6 0.45 0.78 0.59 0.75 0.74
P value adjustment method: BH
📈 Statistical analysis for Fisher
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 5.4171, df = 5, p-value = 0.3671
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: genera_data$value and genera_data$freedman_class
C1 C2 C3 C4 C5
C2 0.63 - - - -
C3 0.59 0.59 - - -
C4 0.59 0.59 0.87 - -
C5 0.59 0.65 0.59 0.59 -
C6 0.59 0.75 0.80 0.59 0.87
P value adjustment method: BH
📈 Statistical analysis for pielou
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 13.643, df = 5, p-value = 0.01804
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum exact test
data: genera_data$value and genera_data$freedman_class
C1 C2 C3 C4 C5
C2 0.126 - - - -
C3 0.519 0.303 - - -
C4 0.034 0.519 0.587 - -
C5 0.013 0.587 0.303 0.587 -
C6 0.134 0.801 0.519 0.828 1.000
P value adjustment method: BH
📈 Statistical analysis for q0
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 5.4171, df = 5, p-value = 0.3671
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: genera_data$value and genera_data$freedman_class
C1 C2 C3 C4 C5
C2 0.63 - - - -
C3 0.59 0.59 - - -
C4 0.59 0.59 0.87 - -
C5 0.59 0.65 0.59 0.59 -
C6 0.59 0.75 0.80 0.59 0.87
P value adjustment method: BH
📈 Statistical analysis for q1
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 12.675, df = 5, p-value = 0.02662
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum exact test
data: genera_data$value and genera_data$freedman_class
C1 C2 C3 C4 C5
C2 0.1721 - - - -
C3 0.1063 0.7657 - - -
C4 0.0622 0.7657 0.7657 - -
C5 0.0096 0.7657 0.5254 0.7657 -
C6 0.1063 0.7657 0.9385 0.9385 0.9385
P value adjustment method: BH
📈 Statistical analysis for q2
🔹 Kruskal-Wallis test
Kruskal-Wallis rank sum test
data: value by freedman_class
Kruskal-Wallis chi-squared = 7.6303, df = 5, p-value = 0.1778
🔹 Wilcoxon test
Pairwise comparisons using Wilcoxon rank sum exact test
data: genera_data$value and genera_data$freedman_class
C1 C2 C3 C4 C5
C2 0.45 - - - -
C3 0.45 0.68 - - -
C4 0.45 0.64 0.64 - -
C5 0.26 0.75 0.45 0.64 -
C6 0.45 0.78 0.59 0.75 0.74
P value adjustment method: BH
Significant test and distribution of alpha diversity indices grouped by methane yield categories identified by different colors (C1-C6) at genus level.
Beta diversity analysis were performed and compared using a rarefied count table at sOTU level and genus level. The relationships between methane yield categories and beta diversity distances were assessed using ANOSIM with 999 permutations (p < 0.05).
# Load database with rarefied sOTUs data
df_raref_otus <- readRDS(file = here("rds","sOTUs_rarefied_counts.rds")) %>%
t() # Rotate
df_raref_genera <- readRDS(file = here("rds","genera_rarefied_counts.rds")) %>%
t() # Rotate
# Load database with methane yield categories
CH4_yield_df <- readRDS(file = here("rds","CH4_yield_df.rds")) %>%
column_to_rownames("sample_id")# Compute Bray-Curtis distance matrix from rarefied OTU table
tax_bc_distmat <- vegdist(df_raref_otus, method = "bray")
# Perform PCoA on Bray-Curtis distance matrix with 6 axes
pcoa_taxa <- cmdscale(tax_bc_distmat, k = 6)# Create dataframe with first two PCoA axes from taxa
pcoa_df <- data.frame(PC1 = pcoa_taxa[, 1], PC2 = pcoa_taxa[, 2])
# Assign freedman class to pcoa_df based on row names
pcoa_df$freedman_class <- CH4_yield_df[rownames(pcoa_df), "freedman_class"]
# Convert CH4 yield categories en factor class
freedman_levels <- c("C1","C2","C3","C4","C5","C6")
# Convert freedman_class a factor with 6 levels
pcoa_df$freedman_class <- factor(pcoa_df$freedman_class, levels = freedman_levels)
# Remove rows with NA values in freedman_class column
pcoa_df <- pcoa_df[!is.na(pcoa_df$freedman_class),]# Convert the taxonomic Bray-Curtis distance matrix to a matrix
tax_bc_distmat_mat <- as.matrix(tax_bc_distmat)
# Subset the distance matrix based on matching row names with pcoa_df
tax_bc_distmat_filt <- tax_bc_distmat_mat[row.names(tax_bc_distmat_mat) %in% row.names(pcoa_df),]
# Perform ANOSIM test with 999 permutations on filtered distance matrix
anosim_result <- anosim(tax_bc_distmat_filt, pcoa_df$freedman_class, perm = 999)# Create a data frame with formatted statistics for PCoA and ANOSIM
results_df <- data.frame(
Var_PCoA1 = formatC(var(pcoa_df$PC1), digits = 2, format = "E", flag = "#"),
Var_PCoA2 = formatC(var(pcoa_df$PC2), digits = 2, format = "E", flag = "#"),
ANOSIM_R = formatC(anosim_result$statistic, digits = 2, format = "E", flag = "#"),
ANOSIM_p = formatC(anosim_result$signif, digits = 2, format = "E", flag = "#"),
stringsAsFactors = FALSE
)
# Display the dataframe in an interactive table format
datatable(results_df)# Plot PCoA
plot(pcoa_df$PC1, pcoa_df$PC2, type = "n", main = "PCoA - Bray-Curtis - sOTUs",
xlab = expression(bold("Axis 1 (7.93%)")), ylab = expression(bold("Axis 2 (4.03%)")))
points(pcoa_df$PC1, pcoa_df$PC2, pch = 19, cex = 0.6,
col = c("#cc3300", "#ff9966", "blue", "#ffcc00", "#99cc33", "#339900")[pcoa_df$freedman_class])Principal coordinate analysis (PCoA) depicts microbial community dynamics through methane yield categories (C1-C6) at sOTUs level.
# Compute Bray-Curtis distance matrix from rarefied genera table
tax_bc_distmat <- vegdist(df_raref_genera, method = "bray")
# Perform PCoA on Bray-Curtis distance matrix with 6 axes
pcoa_taxa <- cmdscale(tax_bc_distmat, k = 6)# Create dataframe with first two PCoA axes from taxa
pcoa_df <- data.frame(PC1 = pcoa_taxa[, 1], PC2 = pcoa_taxa[, 2])
# Assign freedman class to pcoa_df based on row names
pcoa_df$freedman_class <- CH4_yield_df[rownames(pcoa_df), "freedman_class"]
# Convert CH4 yield categories en factor class
freedman_levels <- c("C1","C2","C3","C4","C5","C6")
# Convert freedman_class a factor with 6 levels
pcoa_df$freedman_class <- factor(pcoa_df$freedman_class, levels = freedman_levels)
# Remove rows with NA values in freedman_class column
pcoa_df <- pcoa_df[!is.na(pcoa_df$freedman_class),]# Convert the taxonomic Bray-Curtis distance matrix to a matrix
tax_bc_distmat_mat <- as.matrix(tax_bc_distmat)
# Subset the distance matrix based on matching row names with pcoa_df
tax_bc_distmat_filt <- tax_bc_distmat_mat[row.names(tax_bc_distmat_mat) %in% row.names(pcoa_df),]
# Perform ANOSIM test with 999 permutations on filtered distance matrix
anosim_result <- anosim(tax_bc_distmat_filt, pcoa_df$freedman_class, perm = 999)# Create a data frame with formatted statistics for PCoA and ANOSIM
results_df <- data.frame(
Var_PCoA1 = formatC(var(pcoa_df$PC1), digits = 2, format = "E", flag = "#"),
Var_PCoA2 = formatC(var(pcoa_df$PC2), digits = 2, format = "E", flag = "#"),
ANOSIM_R = formatC(anosim_result$statistic, digits = 2, format = "E", flag = "#"),
ANOSIM_p = formatC(anosim_result$signif, digits = 2, format = "E", flag = "#"),
stringsAsFactors = FALSE
)
# Display the dataframe in an interactive table format
datatable(results_df)# Plot PCoA
plot(pcoa_df$PC1, pcoa_df$PC2, type = "n", main = "PCoA - Bray-Curtis - Genera",
xlab = expression(bold("Axis 1 (3.57%)")), ylab = expression(bold("Axis 2 (3.13%)")))
points(pcoa_df$PC1, pcoa_df$PC2, pch = 19, cex = 0.6,
col = c("#cc3300", "#ff9966", "blue", "#ffcc00", "#99cc33", "#339900")[pcoa_df$freedman_class])Principal coordinate analysis (PCoA) depicts microbial community dynamics through methane yield categories (C1-C6) at genus level.
2MMS
(..)
_O__( u )__0_