Abstract
Colorectal cancer (CRC) is one of the leading causes of cancer-related deaths globally, highlighting the need for effective screening strategies. While colonoscopy is considered the gold standard for CRC detection, its invasiveness and high cost limit its widespread use. This study explores the potential of stool-based microRNAs as non-invasive, cost-effective biomarkers to enhance CRC screening, complementing existing methods such as the Fecal Immunochemical Test (FIT). Our findings aim to support the development of improved screening tools that are accessible and acceptable for population-level implementation.
MicroRNAs (miRNAs) are small, non-coding RNA molecules that regulate gene expression and play an essential role in various biological processes. They are increasingly recognised as biomarkers for numerous diseases, including colorectal cancer, due to their involvement in tumour progression, metastasis, and immune response.
In cancer epidemiology, miRNAs hold promise as diagnostic tools, prognostic markers, and therapeutic targets. This analysis re-make the result from the reports in Clinical Cancer Research (2014), British Journal of Cancer (2014) and Oncotarget (2016) which highlighted miRNAs’ significance in understanding cancer pathways and outcomes. Identifying miRNAs with notable expression changes could advance their use as biomarkers for early disease detection and treatment.
This analyses qPCR array data to identify the most highly expressed miRNAs in a given pre-processed dataset deposited in Clinical Cancer Research (2014). Here, I include statistical analysis of fold changes and calculate medians and visualising the key miRNAs that warrant further investigation for public health and clinical applications.
The libraries necessary for this analysis are loaded below. Missing libraries will be installed automatically.
# Install packages if not already installed
if (!requireNamespace("pROC", quietly = TRUE)) install.packages("pROC")
if (!requireNamespace("ggplot2", quietly = TRUE)) install.packages("ggplot2")
if (!requireNamespace("dplyr", quietly = TRUE)) install.packages("dplyr")
if (!requireNamespace("openxlsx", quietly = TRUE)) install.packages("openxlsx")
if (!requireNamespace("rstatix", quietly = TRUE)) install.packages("rstatix")
if (!requireNamespace("ggpubr", quietly = TRUE)) install.packages("ggpubr")
if (!requireNamespace("RColorBrewer", quietly = TRUE)) install.packages("RColorBrewer")
# Load Libraries
library(pROC)
library(ggplot2)
library(dplyr)
library(openxlsx)
library(rstatix)
library(ggpubr)
library(RColorBrewer)
The pre-processed data is cleaned and formatted to ensure accurate and meaningful analysis. Key steps include:
Removing extraneous data, such as metadata and null values.
Standardising the dataset to facilitate calculations and visualisations.
The dataset is structured for statistical computations, avoiding potential biases caused by unprocessed or incomplete data.
# URL of the array data file published in CCR (Supplementary Table 3)
url <- "https://ndownloader.figstatic.com/files/39900485"
# Load data from Excel file without column names
qPCR.array <- read.xlsx(url, colNames = FALSE)
# Clean the first column by removing the "-#number" suffix
qPCR.array[[1]] <- gsub("-\\d+$", "", qPCR.array[[1]])
# Set the value "miRNA" to the second row of the first column
qPCR.array[2, 1] <- "miRNA"
# Use the second row as column names
colnames(qPCR.array) <- qPCR.array[2, ]
# Remove the first three rows and last three columns
qPCR.array <- qPCR.array[-c(1:3), ]
qPCR.array <- qPCR.array[, -c(7:9)]
# Remove rows containing "Null"
qPCR.array <- qPCR.array[apply(qPCR.array, 1, function(x) !any(x == "Null")), ]
# Convert columns 2 to 6 to numeric
qPCR.array[, 2:6] <- lapply(qPCR.array[, 2:6], as.numeric)
To identify candidate miRNAs, we calculate the median fold change (a measure of central tendency) from 5 pairs of cancer-adjacent normal colonic tissue samples for each miRNA. These metrics are essential for evaluating the fold change and variability across samples.
# Extract only columns 2 to 6 for calculations
data_numeric <- qPCR.array[, 2:6]
# Calculate median for each miRNA
Medians <- apply(data_numeric, 1, median, na.rm = TRUE)
# Create a data frame with miRNA names, medians, and SEM
miRNA_stats <- data.frame(
miRNA = qPCR.array[[1]],
Medians = Medians
)
# Get the top 12 miRNAs by median
top_miRNA_stats <- miRNA_stats[order(-miRNA_stats$Medians), ][1:12, ]
A visual representation of the top 12 miRNAs is provided through bar plots using the median.
Note that including error bars could be misleading due to the small sample size (5 paired samples), and omitting them helps avoid presenting potentially misleading information.
# Plot the top 12 miRNAs
ggplot(top_miRNA_stats, aes(x = reorder(miRNA, Medians), y = Medians, fill = miRNA)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(x = "microRNA", y = "Fold Change") +
theme_classic() +
theme(
text = element_text(size = 15, colour = "black"),
axis.ticks = element_line(colour = "black", size = 1.25),
axis.line = element_line(colour = 'black', size = 1.25),
axis.text.x = element_text(angle = 0, hjust = 0.5, colour = "black", size = 12),
axis.text.y = element_text(angle = 30, hjust = 1, colour = "black", size = 12),
axis.title.y = element_text(color = "black", size = 14, face = "bold"),
axis.title.x = element_text(color = "black", size = 14, face = "bold"),
legend.position = "none"
)
Clean up the unwanted objects to keep the R environment tidy.
rm(data_numeric, miRNA_stats, qPCR.array, top_miRNA_stats, Medians, url)
The predictive performance of specific miRNAs for differentiating between cancer and normal samples is assessed using Receiver Operating Characteristic (ROC) curves. This analysis examines:
Sensitivity (true positive rate).
Specificity (true negative rate).
Area Under the Curve (AUC) as an overall performance measure.
The study identifies miRNAs with potential utility as cancer detection biomarkers.
# Load Data
data <- read.xlsx("20241201 List of stool samples.xlsx")
We load the dataset and filter it to include only the “Cancer” and “Normal” groups. The true labels (1 for cancer, 0 for normal) are extracted for use in ROC curve analysis.
# Filter data for "Cancer" and "Normal" groups
filtered_data <- data %>%
filter(Group %in% c("Cancer", "Normal"))
# Extract true labels
true_labels <- ifelse(filtered_data$Group == "Cancer", 1, 0)
# Define predictive parameters
predictive_params <- c("miR-18a.copies/ng", "20a.copies/ng",
"135b.copies/ng", "221.copies/ng")
For each predictive parameter, we calculate ROC curves and extract sensitivity, specificity, and AUC values. Annotations and arrow positions for visualisation are also prepared.
# Initialize data containers
roc_data_list <- data.frame()
annotation_points <- data.frame()
arrow_positions <- data.frame()
# Calculate ROC Curves and Annotations
for (param in predictive_params) {
# Predicted scores
predicted_scores <- filtered_data[[param]]
# Generate ROC curve
roc_curve <- roc(true_labels, predicted_scores)
# Extract ROC curve data for ggplot
roc_data <- data.frame(
Specificity = rev(roc_curve$specificities),
Sensitivity = rev(roc_curve$sensitivities),
Parameter = param
)
# Append to the overall data frame
roc_data_list <- rbind(roc_data_list, roc_data)
# Sensitivity and specificity at 80% specificity
coords_at_80_spec <- coords(roc_curve, x = 0.8, input = "specificity", ret = c("sensitivity", "specificity"))
sen <- coords_at_80_spec["sensitivity"]
spe <- coords_at_80_spec["specificity"]
# Annotation points
annotation_points <- rbind(annotation_points, data.frame(
x = c(0.25, 0.5, 0.125, 0.8)[which(predictive_params == param)],
y = c(0.15, 0.4, 0.85, 0.4)[which(predictive_params == param)],
label = sprintf("%s\nSen=%.2f\nSpe=%.2f", param, sen, spe),
Parameter = param
))
# Arrow positions
annotation_x <- 1 - roc_data$Specificity
annotation_y <- roc_data$Sensitivity
target_x <- annotation_points$x[nrow(annotation_points)]
target_y <- annotation_points$y[nrow(annotation_points)]
distances <- (annotation_x - target_x)^2 + (annotation_y - target_y)^2
closest_index <- which.min(distances)
# Store arrow starting positions
arrow_positions <- rbind(arrow_positions, data.frame(
x = annotation_x[closest_index],
y = annotation_y[closest_index],
Parameter = param
))
}
# Adjust arrow end positions
adjusted_arrow_positions <- annotation_points %>%
mutate(
xend = ifelse(x < 0.5, x - 0.05, x + 0.05),
yend = ifelse(y < 0.5, y - 0.05, y + 0.05)
)
# Combine arrow start and end positions
arrows <- merge(adjusted_arrow_positions, arrow_positions, by = "Parameter")
We plot the ROC curves with sensitivity, specificity, and AUC annotations using ggplot2, this includes, renaming, calculating sensitivity and specificity at 80% specificity for a better explaination.
# Combine arrow start and end positions correctly
arrows <- merge(
adjusted_arrow_positions,
arrow_positions,
by = "Parameter",
suffixes = c(".label", ".curve") # Clear distinction between label and curve columns
)
# Check the content of `arrows` to ensure alignment
print(arrows)
## Parameter x.label y.label label xend
## 1 135b.copies/ng 0.125 0.85 135b.copies/ng\nSen=0.58\nSpe=0.80 0.075
## 2 20a.copies/ng 0.500 0.40 20a.copies/ng\nSen=0.55\nSpe=0.80 0.550
## 3 221.copies/ng 0.800 0.40 221.copies/ng\nSen=0.51\nSpe=0.80 0.850
## 4 miR-18a.copies/ng 0.250 0.15 miR-18a.copies/ng\nSen=0.46\nSpe=0.80 0.200
## yend x.curve y.curve
## 1 0.90 0.29797980 0.6969697
## 2 0.35 0.33333333 0.6313131
## 3 0.35 0.40909091 0.6868687
## 4 0.10 0.03030303 0.2070707
# Modify the labels in the annotation_points data frame and retain the ROC-related values
annotation_points$label <- dplyr::recode(
annotation_points$Parameter,
"miR-18a.copies/ng" = "miR-18a",
"20a.copies/ng" = "miR-20a",
"135b.copies/ng" = "miR-135b",
"221.copies/ng" = "miR-221"
)
# Recalculate sensitivity, specificity, and AUC for each parameter and update the labels
for (param in predictive_params) {
# Extract the ROC curve for each parameter
roc_curve <- roc(true_labels, filtered_data[[param]])
# Calculate sensitivity and specificity at 80% specificity
coords_at_80_spec <- coords(roc_curve, x = 0.8, input = "specificity", ret = c("sensitivity", "specificity"))
sen <- coords_at_80_spec["sensitivity"]
spe <- coords_at_80_spec["specificity"]
# Calculate AUC for the current ROC curve
auc_value <- auc(roc_curve)
# Update the annotation_points data frame with the calculated sen, spe, and AUC for each parameter
annotation_points[annotation_points$Parameter == param, "sen"] <- sen
annotation_points[annotation_points$Parameter == param, "spe"] <- spe
annotation_points[annotation_points$Parameter == param, "AUC"] <- auc_value
# Update the label with sen, spe, and AUC values
annotation_points[annotation_points$Parameter == param, "label"] <-
sprintf("%s\nSen: %.2f\nSpe: %.2f\nAUC: %.2f", annotation_points[annotation_points$Parameter == param, "label"], sen, spe, auc_value)
}
# Reformat labels for clarity
ggplot(data = roc_data_list, aes(x = 1 - Specificity, y = Sensitivity, color = Parameter)) +
geom_line(size = 1) + # Plot ROC curves
geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") + # Diagonal line
labs(
# title = "ROC Curves with Sensitivity, Specificity, and AUC",
x = "1 - Specificity (False Positive Rate)",
y = "Sensitivity (True Positive Rate)"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
legend.position = "none",
legend.title = element_text(face = "bold"
)
) +
scale_color_discrete(name = "Predictive Parameters") + # Legend title
# Add arrows above the labels
geom_segment(
data = arrows,
aes(
x = x.label, # Start x-coordinate (from the label)
y = y.label, # Start y-coordinate (from the label)
xend = x.curve, # End x-coordinate (closest point on the curve)
yend = y.curve, # End y-coordinate (closest point on the curve)
color = Parameter # Match arrow color to the curve
),
inherit.aes = FALSE,
arrow = arrow(length = unit(0.02, "npc")),
size = 1.5
) +
# Use geom_label to add background color to labels
geom_label(
data = annotation_points,
aes(x = x, y = y, label = label, fill = Parameter), # Use 'fill' for background color
inherit.aes = FALSE,
size = 4,
label.size = 0.5, # Border size around the label
label.padding = unit(0.3, "lines"), # Padding inside the label
color = "white", # Text color
alpha = 1 # Transparency of the label background
)
A logistic regression model was used for the combined ROC curve analysis, which predicts the probability of a binary outcome (Cancer vs. Normal) based on multiple predictors, including the expression levels of four microRNAs (miR-18a, miR-221, miR-20a, miR-135b).
The logistic regression model estimates the likelihood of the outcome occurring (Cancer) for each observation. Predicted probabilities from this model are used to plot a ROC curve, which evaluates the model’s performance by showing the trade-off between sensitivity (true positive rate) and specificity (1 - false positive rate) at different decision thresholds.
The model formula:
logit(P)=β0+β1(miR-18a)+β2(miR-221)+β3(miR-20a)+β4(miR-135b)
# Create the binary outcome variable (1 for Cancer, 0 for Normal)
true_labels <- ifelse(filtered_data$Group == "Cancer", 1, 0)
# Fit a logistic regression model with multiple predictors
model_multi <- glm(true_labels ~ `miR-18a.copies/ng` +
`221.copies/ng` +
`20a.copies/ng` +
`135b.copies/ng`,
data = filtered_data,
family = binomial)
# Get predicted probabilities for the new model
predicted_probs_multi <- predict(model_multi, type = "response")
# Compute ROC curve using pROC
roc_curve_multi <- roc(true_labels, predicted_probs_multi)
# Extract data for ggplot
roc_data <- data.frame(
Specificity = rev(roc_curve_multi$specificities),
Sensitivity = rev(roc_curve_multi$sensitivities)
)
# Find the sensitivity at 80% specificity
target_specificity <- 0.8
# Find the closest specificity value to 0.8
closest_index <- which.min(abs(roc_data$Specificity - target_specificity))
# Extract the corresponding sensitivity and specificity
sensitivity_at_80_spec <- roc_data$Sensitivity[closest_index]
specificity_at_80_spec <- roc_data$Specificity[closest_index]
# Display the result
print(paste("Sensitivity at 80% specificity: ", round(sensitivity_at_80_spec, 3)))
## [1] "Sensitivity at 80% specificity: 0.621"
print(paste("Specificity at 80% specificity: ", round(specificity_at_80_spec, 3)))
## [1] "Specificity at 80% specificity: 0.798"
# Calculate AUC
auc_multi <- auc(roc_curve_multi)
auc_label <- paste0("AUC: ", round(auc_multi, 3))
# Create the annotation points for labeling on the ROC plot at (0.5, 0.5)
annotation_points <- data.frame(
x = 0.6, # Label at (0.5, 0.5)
y = 0.4, # Label at (0.5, 0.5)
label = sprintf("Multi-Predictor\nSen: %.2f\nSpe: %.2f\nAUC: %.2f", sensitivity_at_80_spec, specificity_at_80_spec, auc_multi),
Parameter = "Multi-Predictor"
)
# Define custom arrow positions, making the arrow point to the closest point on the curve
arrow_positions <- data.frame(
x = 0.375, # Start of the arrow at (0.5, 0.5)
y = 0.75, # Start of the arrow at (0.5, 0.5)
Parameter = "Multi-Predictor"
)
# Combine arrow start and end positions
arrows <- merge(
annotation_points,
arrow_positions,
by = "Parameter",
suffixes = c(".label", ".curve")
)
# Plot the ROC curve using ggplot with AUC and sensitivity annotations
ggplot(roc_data, aes(x = 1 - Specificity, y = Sensitivity)) +
geom_line(color = "saddlebrown", size = 1.2) + # ROC curve
geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") + # Diagonal line
labs(
x = "1 - Specificity (False Positive Rate)",
y = "Sensitivity (True Positive Rate)"
) +
# Add arrows pointing to the labels
geom_segment(
data = arrows,
aes(
x = x.label, # Start x-coordinate (from the label)
y = y.label, # Start y-coordinate (from the label)
xend = x.curve, # End x-coordinate (closest point on the curve)
yend = y.curve, # End y-coordinate (closest point on the curve)
color = Parameter # Match arrow color to the curve
),
inherit.aes = FALSE,
arrow = arrow(length = unit(0.02, "npc")),
size = 1.5
) +
# Use geom_label to add background color to labels
geom_label(
data = annotation_points,
aes(x = x, y = y, label = label, fill = Parameter), # Use 'fill' for background color
inherit.aes = FALSE,
size = 4,
label.size = 0.5, # Border size around the label
label.padding = unit(0.3, "lines"), # Padding inside the label
color = "white", # Text color
alpha = 1 # Transparency of the label background
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
legend.position = "none",
legend.title = element_text(face = "bold")
) +
xlim(0, 1) + # Adjust axis limits to make sure the arrows and labels are within view
ylim(0, 1)
Remove all the unwanted objects except the data object for the next analysis.
# Remove all objects except "data"
rm(list = setdiff(ls(), "data"))
To address the non-normal distribution of miR-135b levels, a log10 transformation is applied. A small constant (+1) is added to avoid taking the logarithm of zero.
# Apply log10 transformation to miR-135b data
data$log_miR_135b <- log10(data$`135b.copies/ng` + 1) # Add 1 to avoid log(0)
# Set the order of the groups
data$Group <- factor(data$Group, levels = c("Normal", "Adenoma", "AN", "Cancer"))
The Wilcoxon rank-sum test is used for the group comparisons of the transformed miR-135b levels. P-values are adjusted using the Bonferroni correction method.
# Perform pairwise Wilcoxon test for log-transformed miR-135b values
stat.test <- data %>%
wilcox_test(log_miR_135b ~ Group, p.adjust.method = "bonferroni") %>%
add_significance("p") %>%
filter(p.adj.signif != "ns") # Keep only significant comparisons
# Adjust the positions for the p-value labels
stat.test <- stat.test %>%
mutate(y.position = seq(max(data$log_miR_135b) + 0.1, by = 0.3, length.out = nrow(stat.test)))
Scatter plots, overlaid with box plots, illustrate the differences in miR-135b levels across groups. Annotated p-values highlight significant findings, supporting transparent interpretation.
# Plot the scatter plot with log-transformed miR-135b values and box plots
ggplot(data, aes(x = Group, y = log_miR_135b, color = Group)) +
geom_point(position = "jitter", size = 3, alpha = 0.7) +
geom_boxplot(
aes(x = Group, y = log_miR_135b, fill = Group),
alpha = 0.3,
color = "black",
outlier.shape = NA,
width = 0.3
) +
stat_pvalue_manual(stat.test, label = "p.signif", size = 4, face = "bold") +
labs(
x = "Group",
y = "Stool miR-135b level, log10"
) +
theme_classic() +
theme(
text = element_text(size = 15, colour = "black"),
axis.ticks = element_line(colour = "black", size = 1),
axis.line = element_line(colour = 'black', size = 1),
axis.text.x = element_text(angle = 0, hjust = 0.5, colour = "black", size = 13, face = "bold"),
axis.text.y = element_text(angle = 0, hjust = 0.5, colour = "black", size = 13, face = "bold"),
axis.title.y = element_text(color = "black", size = 15, face = "bold"),
axis.title.x = element_text(color = "black", size = 15, face = "bold"),
legend.position = "none"
) +
scale_y_continuous(limits = c(-0.5, 5.25), breaks = c(0, 1, 2, 3, 4, 5))
We also applied a similar approach to analyse miR-135b levels in cancer patients stratified by tumour location (proximal vs. distal), considering that Fecal Immunochemical Test also shows varying sensitivity and specificity depending on the context.
The analysis focuses on cancer group data, ensuring only relevant comparisons are performed. Statistical significance is tested.
# Filter the data to include only the Cancer group and remove rows with NA values
cancer_data <- data %>%
filter(Group == "Cancer") %>% # Retain rows where Group is "Cancer"
drop_na(Location) # Remove rows where 'Location' is NA
# Rename the Location groups
cancer_data$Location <- recode(cancer_data$Location, "1" = "PROXIMAL", "2" = "DISTAL")
# Set the order of the Location groups
cancer_data$Location <- factor(cancer_data$Location, levels = c("PROXIMAL", "DISTAL"))
# Perform Wilcoxon rank-sum test for log-transformed miR-135b values between two independent groups
stat.test <- cancer_data %>%
wilcox_test(log_miR_135b ~ Location) %>%
add_significance("p")
# Adjust the positions for the p-value labels
stat.test <- stat.test %>%
mutate(y.position = seq(max(cancer_data$log_miR_135b) + 0.1, by = 0.3, length.out = nrow(stat.test)))
Scatter plots with overlaid box plots are used to visualise miR-135b levels by tumour location. Significant differences are annotated to inform potential clinical implications.
# Plot the scatter plot with log-transformed miR-135b values and box plots for Location
ggplot(cancer_data,
aes(x = Location, y = log_miR_135b, fill = Location)) +
geom_point(aes(color = Location),
position = position_jitter(width = 0.2), size = 2.5, alpha = 1) +
geom_violin(trim = TRUE, color = NA, alpha = 0.5) +
geom_boxplot(width = 0.12, fill = "white", , alpha = 0.7) +
stat_pvalue_manual(stat.test, label = "p.signif",
tip.length = 0.03,
inherit.aes = FALSE,
y.position = 4.2,
size = 6,
face = "bold",
vjust = 0,
hjust = 0) +
labs(
x = "Location",
y = "Stool miR-135b level, log10"
) +
scale_fill_brewer(palette = "Set1") +
scale_color_brewer(palette = "Set1") +
scale_y_continuous(limits = c(-0.05, 4.3),
breaks = c(0, 1, 2, 3, 4)) +
theme_classic() +
theme(
text = element_text(size = 15, colour = "black"),
axis.ticks = element_line(colour = "black", size = 1.25),
axis.line = element_line(colour = 'black', size = 1.25),
axis.text.x = element_text(angle = 45, hjust = 1, colour = "black", size = 13, face = "bold"),
axis.text.y = element_text(angle = 0, hjust = 0.5, colour = "black", size = 13, face = "bold"),
axis.title.y = element_text(color = "black", size = 15, face = "bold"),
axis.title.x = element_text(color = "black", size = 15, face = "bold"),
legend.position = "none"
)
Antibiotics can alter the composition of the intestinal microbiota, which may, in turn, affect miRNA expression in feces. To investigate this, we examined the impact of antibiotics on fecal miR-20a levels by comparing patients who had taken antibiotics within 30 days prior to faecal sample collection with those who had not.
We filter the dataset to retain only cancer patients and categorise the Antibiotics variable into “YES” and “NO.”
# Filter the data to include only the Cancer group
cancer_data <- data %>% filter(Group == "Cancer")
# Rename the Antibiotics groups
cancer_data$Antibiotics <- recode(cancer_data$Antibiotics, "N" = "NO", "Y" = "YES")
# Set the order of the Antibiotics groups
cancer_data$Antibiotics <- factor(cancer_data$Antibiotics, levels = c("NO", "YES"))
The Wilcoxon rank-sum test compares log-transformed miR-135b levels between antibiotic groups. Bonferroni correction adjusts for multiple comparisons, and annotation positions are customised.
# Perform pairwise Wilcoxon test for log10-transformed miR-135b values
stat.test <- cancer_data %>%
wilcox_test(log_miR_135b ~ Antibiotics) %>%
add_significance("p")
# Adjust the positions for the p-value labels
stat.test <- stat.test %>%
mutate(y.position = seq(max(cancer_data$log_miR_135b) + 0.1, by = 0.3, length.out = nrow(stat.test)))
We use a scatter plot with overlaid boxplots to visualise log-transformed miR-135b levels for antibiotic usage groups. Significant differences are annotated.
# Plot the scatter plot with log-transformed miR-135b values and box plots for Antibiotics
ggplot(cancer_data,
aes(x = Antibiotics, y = log_miR_135b, fill = Antibiotics)) +
geom_point(aes(color = Antibiotics), # Points mapped to Antibiotics
position = position_jitter(width = 0.2), size = 2.5, alpha = 1) + # Jitter for spreading
geom_violin(trim = TRUE, color = NA, alpha = 0.5) + # Violin plot
geom_boxplot(width = 0.12, fill = "white", alpha = 0.7) + # Boxplot inside the violin
stat_pvalue_manual(stat.test, label = "p.signif",
tip.length = 0.03,
inherit.aes = FALSE,
y.position = 4.2,
size = 6,
face = "bold",
vjust = 0,
hjust = 0) +
labs(
x = "Antibiotics",
y = "Stool miR-135b level, log10"
) +
scale_fill_brewer(palette = "Set1") + # Violin fill colors
scale_color_brewer(palette = "Set1") + # Point colors
scale_y_continuous(limits = c(-0.05, 4.3),
breaks = c(0, 1, 2, 3, 4)) +
theme_classic() +
theme(
text = element_text(size = 15, colour = "black"),
axis.ticks = element_line(colour = "black", size = 1.25),
axis.line = element_line(colour = 'black', size = 1.25),
axis.text.x = element_text(angle = 45, hjust = 1, colour = "black", size = 13, face = "bold"),
axis.text.y = element_text(angle = 0, hjust = 0.5, colour = "black", size = 13, face = "bold"),
axis.title.y = element_text(color = "black", size = 15, face = "bold"),
axis.title.x = element_text(color = "black", size = 15, face = "bold"),
legend.position = "none"
)
sessionInfo()
## R version 4.4.2 (2024-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26100)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_United Kingdom.utf8
## [2] LC_CTYPE=English_United Kingdom.utf8
## [3] LC_MONETARY=English_United Kingdom.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United Kingdom.utf8
##
## time zone: Europe/London
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] RColorBrewer_1.1-3 ggpubr_0.6.0 rstatix_0.7.2 openxlsx_4.2.7.1
## [5] dplyr_1.1.4 ggplot2_3.5.1 pROC_1.18.5
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.9 utf8_1.2.4 generics_0.1.3 tidyr_1.3.1
## [5] stringi_1.8.4 digest_0.6.36 magrittr_2.0.3 evaluate_0.24.0
## [9] grid_4.4.2 fastmap_1.2.0 plyr_1.8.9 jsonlite_1.8.8
## [13] zip_2.3.1 backports_1.5.0 purrr_1.0.2 fansi_1.0.6
## [17] scales_1.3.0 jquerylib_0.1.4 abind_1.4-5 cli_3.6.3
## [21] rlang_1.1.4 munsell_0.5.1 withr_3.0.0 cachem_1.1.0
## [25] yaml_2.3.9 tools_4.4.2 ggsignif_0.6.4 colorspace_2.1-0
## [29] broom_1.0.6 vctrs_0.6.5 R6_2.5.1 lifecycle_1.0.4
## [33] car_3.1-2 pkgconfig_2.0.3 pillar_1.9.0 bslib_0.7.0
## [37] gtable_0.3.5 glue_1.7.0 Rcpp_1.0.12 highr_0.11
## [41] xfun_0.45 tibble_3.2.1 tidyselect_1.2.1 rstudioapi_0.16.0
## [45] knitr_1.48 farver_2.1.2 htmltools_0.5.8.1 rmarkdown_2.27
## [49] carData_3.0-5 labeling_0.4.3 compiler_4.4.2