传统富集气泡图通常同时表达 4 个维度:
Description
(y 轴)、GeneRatio
/count
(x 轴)、Input.number
(气泡大小)、P.adjust/FDR
(颜色)。 本文在气泡图左侧加入两层 桑基/Alluvial(layer2 -> layer3
),把每个 pathway 中的 命中基因数(Input.number) 映射为流的宽度,从而实现第 5 个维度(“基因-通路归属的流量”)的直观呈现。
count = Input/Background
,size =
Input.number
,color = FDR
。layer2 → layer3
的 alluvial,流宽 =
Input.number
;节点标签显示两端层级。layer3
的位置与右侧气泡的行一一对应。# 如缺包会自动安装(可按需注释)
need <- c("tidyverse","ggalluvial","RColorBrewer","cowplot")
for (p in need) if (!requireNamespace(p, quietly = TRUE)) install.packages(p)
invisible(lapply(need, library, character.only = TRUE))
set.seed(2025)
笔记:
ggsankey
在部分 R 版本不可用,因此本文统一采用 CRAN 上更稳的ggalluvial
。关键几何对象: -geom_alluvium()
:画流(可映射y
为权重/宽度) -geom_stratum()
:画节点 -geom_text(stat="stratum")
:给节点自动加标签
grp <- list(
"1.0 Biosynthesis" =
c("Prodigiosin biosynthesis","Benzoxazinoid biosynthesis","Zeatin biosynthesis"),
"1.1 Carbohydrate" =
c("Glyoxylate and dicarboxylate metabolism","Polyketide sugar unit biosynthesis","Pentose phosphate pathway"),
"1.2 Energy metabolism" =
c("Nitrogen metabolism","Sulfur metabolism"),
"1.3 Lipid metabolism" =
c("Glycerolipid metabolism","Arachidonic acid metabolism","Linoleic acid metabolism"),
"1.5 Metabolism" =
c("Cyanamino acid metabolism","Fluorobenzoate degradation","Naphthalene degradation"),
"1.7 Glycan" =
c("Glycosaminoglycan degradation","Glycan biosynthesis (general)"),
"1.9 Metabolism" =
c("Limonene and pinene degradation","Polyketide backbone biosynthesis"),
"1.11 Xenobiotics" =
c("Metabolism of xenobiotics by cytochrome P450","Chloroalkane and chloroalkene degradation","Toluene degradation")
)
enrich_demo <- purrr::imap_dfr(grp, ~ tibble(layer2 = .y, layer3 = .x)) %>%
mutate(
Background.number = sample(seq(160, 420, by = 10), n(), replace = TRUE),
Input.number = pmax(3L, round(Background.number * runif(n(), 0.08, 0.30))),
FDR = pmin(1, stats::rbeta(n(), 0.8, 3.2))
)
enrich_demo %>% dplyr::sample_n(5)
字段要求(如换成你自己的数据): -
layer2
:上层类目(如 KEGG 2 级) -layer3
:pathway 名称(右图 y 轴 & 左图右侧节点) -Input.number
:命中基因数(控制气泡大小 & 流宽) -Background.number
:背景数(用于计算count = Input/Background
) -FDR
:校正后的 p 值(控制气泡颜色)
读取你自己的文件示例(制表符分隔):
library(readr) raw <- readr::read_tsv("DEG.Ko.enrich.xls") # 选取并重命名为本文所需字段名 df0 <- raw %>% dplyr::select(layer2, layer3, Input.number, Background.number, FDR)
# 计算 count,选前 20 个 pathway(可改 n)
df <- enrich_demo %>%
mutate(count = Input.number / Background.number) %>%
arrange(desc(count)) %>%
slice_head(n = 20)
# 为了让左、右两图对齐:把 layer3 设为“反向因子”(上到下与气泡行一致)
df$layer3 <- factor(df$layer3, levels = df$layer3 %>% unique() %>% rev())
df$layer2 <- factor(df$layer2, levels = df$layer2 %>% unique())
# 预览(前 10 行)
df %>% select(layer2, layer3, Input.number, Background.number, count, FDR) %>% head(10)
p_bubble <- df %>%
ggplot(aes(x = count, y = layer3)) +
geom_point(aes(size = Input.number, color = FDR, fill = FDR), shape = 19) +
scale_color_gradientn(colours = RColorBrewer::brewer.pal(6, "RdBu")) +
scale_fill_gradientn(colours = RColorBrewer::brewer.pal(6, "RdBu")) +
guides(size = guide_legend(title = "Count")) +
theme_bw() +
theme(
axis.title = element_blank(),
axis.text.y = element_blank(), # y 轴文字交给左侧节点
axis.ticks.y = element_blank(),
legend.key = element_blank(),
legend.title = element_text(size = 10),
legend.text = element_text(size = 8),
plot.margin = margin(5.5, 5.5, 5.5, 0)
)
p_bubble
右:富集气泡图。x=count=Input/Background;气泡大小=Input.number;颜色=FDR (RdBu 渐变)。
笔记:如果想让 FDR 越小越“蓝”(而不是越红),可用
rev(RColorBrewer::brewer.pal(6, 'RdBu'))
反转配色。
layer2 -> layer3
)df_alluv <- df %>%
group_by(layer3) %>%
mutate(weight = Input.number / sum(Input.number)) %>%
ungroup() %>%
select(layer2, layer3, weight)
p_alluvial <- ggplot(
df_alluv,
aes(y = weight, axis1 = layer2, axis2 = layer3)
) +
geom_alluvium(aes(fill = layer2), width = 0.18, alpha = 0.85) +
geom_stratum(width = 0.18, fill = "grey95", color = "grey60") +
geom_text(stat = "stratum", aes(label = after_stat(stratum)), size = 3, color = "black") +
scale_fill_manual(values = colorRampPalette(RColorBrewer::brewer.pal(11, "Paired"))(nlevels(df$layer2))) +
theme_void() +
theme(legend.position = "none", plot.margin = margin(5.5, 10, 5.5, 5.5))
p_alluvial
左:Alluvial(桑基)图。流宽映射 Input.number;两端节点显示 layer2 与 layer3。
笔记:
y=weight
决定“流宽”;这里直接用Input.number
,与右侧气泡大小保持一致语义。
p_final <- cowplot::plot_grid(p_alluvial, p_bubble, ncol = 2, rel_widths = c(1.9, 1))
p_final
左:Alluvial;右:气泡图。两侧行顺序严格对齐,但存在一定误差,需要借助AI作图工具调节,便于对读。
# 可选保存
# ggsave("sankey_bubble_combo.png", p_final, width = 12, height = 6, dpi = 300)
# ggsave("sankey_bubble_combo.pdf", p_final, width = 12, height = 6)
library(dplyr)
# Top 5 按 count
sum_top <- df %>% arrange(desc(count)) %>%
transmute(layer2, layer3, Input.number, Background.number,
count = round(count, 3), FDR = signif(FDR, 3)) %>%
head(5)
knitr::kable(sum_top, caption = '按 count 排序的前 5 个 pathway(示例数据)。')
layer2 | layer3 | Input.number | Background.number | count | FDR |
---|---|---|---|---|---|
1.7 Glycan | Glycan biosynthesis (general) | 107 | 360 | 0.297 | 0.3400 |
1.11 Xenobiotics | Metabolism of xenobiotics by cytochrome P450 | 74 | 260 | 0.285 | 0.0827 |
1.0 Biosynthesis | Zeatin biosynthesis | 54 | 190 | 0.284 | 0.6260 |
1.9 Metabolism | Limonene and pinene degradation | 106 | 380 | 0.279 | 0.1620 |
1.1 Carbohydrate | Pentose phosphate pathway | 102 | 380 | 0.268 | 0.0541 |
# 哪些上层类目贡献的基因最多?
by_l2 <- df %>% group_by(layer2) %>% summarise(total_input = sum(Input.number), .groups='drop') %>% arrange(desc(total_input))
knitr::kable(by_l2, caption = '各 layer2 的命中基因总数 (Top20 子集内)。')
layer2 | total_input |
---|---|
1.1 Carbohydrate | 225 |
1.11 Xenobiotics | 198 |
1.9 Metabolism | 177 |
1.0 Biosynthesis | 162 |
1.7 Glycan | 157 |
1.3 Lipid metabolism | 144 |
1.2 Energy metabolism | 101 |
1.5 Metabolism | 87 |
解读要点:
count
最高(0.297),对应的命中基因数为 107 个、FDR ≈ 0.34。layer2
),以及流宽代表的基因量级。提示:真实项目中,可把
df
的slice_head(n=20)
改成阈值(如 FDR < 0.05)或其它排序逻辑(如取Input.number
最大的 TopN),以适配你的论文或报告需求。
slice_head(n = 20)
或加过滤条件(如
filter(FDR < 0.05)
)。scale_color_gradientn(colours = rev(brewer.pal(6, 'RdBu')))
可反转。scale_fill_manual(values = ...)
可换调色板或指定少量命名色。layer3
的因子顺序;axis2 = layer3
使用同一个因子,确保左右严格对齐。ggsave()
分别导出
PNG/PDF;期刊排版建议 PDF(矢量)。geom_alluvium(..., alpha)
、geom_stratum(width=...)
、标签
size
等参数,或给右图加网格线增强对齐。sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Asia/Shanghai
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] cowplot_1.2.0 RColorBrewer_1.1-3 ggalluvial_0.12.5 lubridate_1.9.4
## [5] forcats_1.0.0 stringr_1.5.2 dplyr_1.1.4 purrr_1.1.0
## [9] readr_2.1.5 tidyr_1.3.1 tibble_3.3.0 ggplot2_4.0.0
## [13] tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 jsonlite_2.0.0 compiler_4.4.2 tidyselect_1.2.1
## [5] jquerylib_0.1.4 scales_1.4.0 yaml_2.3.10 fastmap_1.2.0
## [9] R6_2.6.1 labeling_0.4.3 generics_0.1.4 knitr_1.50
## [13] bslib_0.9.0 pillar_1.11.1 tzdb_0.5.0 rlang_1.1.6
## [17] stringi_1.8.7 cachem_1.1.0 xfun_0.53 sass_0.4.10
## [21] S7_0.2.0 timechange_0.3.0 cli_3.6.5 withr_3.0.2
## [25] magrittr_2.0.4 digest_0.6.37 grid_4.4.2 rstudioapi_0.17.1
## [29] hms_1.1.3 lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.5
## [33] glue_1.8.0 farver_2.1.2 rmarkdown_2.29 tools_4.4.2
## [37] pkgconfig_2.0.3 htmltools_0.5.8.1