传统富集气泡图通常同时表达 4 个维度:
Description(y 轴)、GeneRatio/count(x 轴)、Input.number(气泡大小)、P.adjust/FDR(颜色)。 本文在气泡图左侧加入两层 桑基/Alluvial(layer2 -> layer3),把每个 pathway 中的 命中基因数(Input.number) 映射为流的宽度,从而实现第 5 个维度(“基因-通路归属的流量”)的直观呈现。
count = Input/Background,size =
Input.number,color = FDR。layer2 → layer3 的 alluvial,流宽 =
Input.number;节点标签显示两端层级。layer3 的位置与右侧气泡的行一一对应。# 如缺包会自动安装(可按需注释)
need <- c("tidyverse","ggalluvial","RColorBrewer","cowplot")
for (p in need) if (!requireNamespace(p, quietly = TRUE)) install.packages(p)
invisible(lapply(need, library, character.only = TRUE))
set.seed(2025)
笔记:
ggsankey在部分 R 版本不可用,因此本文统一采用 CRAN 上更稳的ggalluvial。关键几何对象: -geom_alluvium():画流(可映射y为权重/宽度) -geom_stratum():画节点 -geom_text(stat="stratum"):给节点自动加标签
grp <- list(
  "1.0 Biosynthesis" =
    c("Prodigiosin biosynthesis","Benzoxazinoid biosynthesis","Zeatin biosynthesis"),
  "1.1 Carbohydrate" =
    c("Glyoxylate and dicarboxylate metabolism","Polyketide sugar unit biosynthesis","Pentose phosphate pathway"),
  "1.2 Energy metabolism" =
    c("Nitrogen metabolism","Sulfur metabolism"),
  "1.3 Lipid metabolism" =
    c("Glycerolipid metabolism","Arachidonic acid metabolism","Linoleic acid metabolism"),
  "1.5 Metabolism" =
    c("Cyanamino acid metabolism","Fluorobenzoate degradation","Naphthalene degradation"),
  "1.7 Glycan" =
    c("Glycosaminoglycan degradation","Glycan biosynthesis (general)"),
  "1.9 Metabolism" =
    c("Limonene and pinene degradation","Polyketide backbone biosynthesis"),
  "1.11 Xenobiotics" =
    c("Metabolism of xenobiotics by cytochrome P450","Chloroalkane and chloroalkene degradation","Toluene degradation")
)
enrich_demo <- purrr::imap_dfr(grp, ~ tibble(layer2 = .y, layer3 = .x)) %>%
  mutate(
    Background.number = sample(seq(160, 420, by = 10), n(), replace = TRUE),
    Input.number      = pmax(3L, round(Background.number * runif(n(), 0.08, 0.30))),
    FDR               = pmin(1, stats::rbeta(n(), 0.8, 3.2))
  )
enrich_demo %>% dplyr::sample_n(5)
字段要求(如换成你自己的数据): -
layer2:上层类目(如 KEGG 2 级) -layer3:pathway 名称(右图 y 轴 & 左图右侧节点) -Input.number:命中基因数(控制气泡大小 & 流宽) -Background.number:背景数(用于计算count = Input/Background) -FDR:校正后的 p 值(控制气泡颜色)
读取你自己的文件示例(制表符分隔):
library(readr) raw <- readr::read_tsv("DEG.Ko.enrich.xls") # 选取并重命名为本文所需字段名 df0 <- raw %>% dplyr::select(layer2, layer3, Input.number, Background.number, FDR)
# 计算 count,选前 20 个 pathway(可改 n)
df <- enrich_demo %>%
  mutate(count = Input.number / Background.number) %>%
  arrange(desc(count)) %>%
  slice_head(n = 20)
# 为了让左、右两图对齐:把 layer3 设为“反向因子”(上到下与气泡行一致)
df$layer3 <- factor(df$layer3, levels = df$layer3 %>% unique() %>% rev())
df$layer2 <- factor(df$layer2, levels = df$layer2 %>% unique())
# 预览(前 10 行)
df %>% select(layer2, layer3, Input.number, Background.number, count, FDR) %>% head(10)
p_bubble <- df %>%
  ggplot(aes(x = count, y = layer3)) +
  geom_point(aes(size = Input.number, color = FDR, fill = FDR), shape = 19) +
  scale_color_gradientn(colours = RColorBrewer::brewer.pal(6, "RdBu")) +
  scale_fill_gradientn(colours = RColorBrewer::brewer.pal(6, "RdBu")) +
  guides(size = guide_legend(title = "Count")) +
  theme_bw() +
  theme(
    axis.title   = element_blank(),
    axis.text.y  = element_blank(),   # y 轴文字交给左侧节点
    axis.ticks.y = element_blank(),
    legend.key   = element_blank(),
    legend.title = element_text(size = 10),
    legend.text  = element_text(size = 8),
    plot.margin  = margin(5.5, 5.5, 5.5, 0)
  )
p_bubble
右:富集气泡图。x=count=Input/Background;气泡大小=Input.number;颜色=FDR (RdBu 渐变)。
笔记:如果想让 FDR 越小越“蓝”(而不是越红),可用
rev(RColorBrewer::brewer.pal(6, 'RdBu'))反转配色。
layer2 -> layer3)df_alluv <- df %>%
  group_by(layer3) %>%
  mutate(weight = Input.number / sum(Input.number)) %>%  
  ungroup() %>%
  select(layer2, layer3, weight)
p_alluvial <- ggplot(
  df_alluv,
  aes(y = weight, axis1 = layer2, axis2 = layer3)
) +
  geom_alluvium(aes(fill = layer2), width = 0.18, alpha = 0.85) +
  geom_stratum(width = 0.18, fill = "grey95", color = "grey60") +
  geom_text(stat = "stratum", aes(label = after_stat(stratum)), size = 3, color = "black") +
  scale_fill_manual(values = colorRampPalette(RColorBrewer::brewer.pal(11, "Paired"))(nlevels(df$layer2))) +
  theme_void() +
  theme(legend.position = "none", plot.margin = margin(5.5, 10, 5.5, 5.5))
p_alluvial
左:Alluvial(桑基)图。流宽映射 Input.number;两端节点显示 layer2 与 layer3。
笔记:
y=weight决定“流宽”;这里直接用Input.number,与右侧气泡大小保持一致语义。
p_final <- cowplot::plot_grid(p_alluvial, p_bubble, ncol = 2, rel_widths = c(1.9, 1))
p_final
左:Alluvial;右:气泡图。两侧行顺序严格对齐,但存在一定误差,需要借助AI作图工具调节,便于对读。
# 可选保存
# ggsave("sankey_bubble_combo.png", p_final, width = 12, height = 6, dpi = 300)
# ggsave("sankey_bubble_combo.pdf",  p_final, width = 12, height = 6)
library(dplyr)
# Top 5 按 count
sum_top <- df %>% arrange(desc(count)) %>%
  transmute(layer2, layer3, Input.number, Background.number,
            count = round(count, 3), FDR = signif(FDR, 3)) %>%
  head(5)
knitr::kable(sum_top, caption = '按 count 排序的前 5 个 pathway(示例数据)。')
| layer2 | layer3 | Input.number | Background.number | count | FDR | 
|---|---|---|---|---|---|
| 1.7 Glycan | Glycan biosynthesis (general) | 107 | 360 | 0.297 | 0.3400 | 
| 1.11 Xenobiotics | Metabolism of xenobiotics by cytochrome P450 | 74 | 260 | 0.285 | 0.0827 | 
| 1.0 Biosynthesis | Zeatin biosynthesis | 54 | 190 | 0.284 | 0.6260 | 
| 1.9 Metabolism | Limonene and pinene degradation | 106 | 380 | 0.279 | 0.1620 | 
| 1.1 Carbohydrate | Pentose phosphate pathway | 102 | 380 | 0.268 | 0.0541 | 
# 哪些上层类目贡献的基因最多?
by_l2 <- df %>% group_by(layer2) %>% summarise(total_input = sum(Input.number), .groups='drop') %>% arrange(desc(total_input))
knitr::kable(by_l2, caption = '各 layer2 的命中基因总数 (Top20 子集内)。')
| layer2 | total_input | 
|---|---|
| 1.1 Carbohydrate | 225 | 
| 1.11 Xenobiotics | 198 | 
| 1.9 Metabolism | 177 | 
| 1.0 Biosynthesis | 162 | 
| 1.7 Glycan | 157 | 
| 1.3 Lipid metabolism | 144 | 
| 1.2 Energy metabolism | 101 | 
| 1.5 Metabolism | 87 | 
解读要点:
count
最高(0.297),对应的命中基因数为 107 个、FDR ≈ 0.34。layer2),以及流宽代表的基因量级。提示:真实项目中,可把
df的slice_head(n=20)改成阈值(如 FDR < 0.05)或其它排序逻辑(如取Input.number最大的 TopN),以适配你的论文或报告需求。
slice_head(n = 20) 或加过滤条件(如
filter(FDR < 0.05))。scale_color_gradientn(colours = rev(brewer.pal(6, 'RdBu')))
可反转。scale_fill_manual(values = ...)
可换调色板或指定少量命名色。layer3
的因子顺序;axis2 = layer3
使用同一个因子,确保左右严格对齐。ggsave() 分别导出
PNG/PDF;期刊排版建议 PDF(矢量)。geom_alluvium(..., alpha)、geom_stratum(width=...)、标签
size 等参数,或给右图加网格线增强对齐。sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Asia/Shanghai
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] cowplot_1.2.0      RColorBrewer_1.1-3 ggalluvial_0.12.5  lubridate_1.9.4   
##  [5] forcats_1.0.0      stringr_1.5.2      dplyr_1.1.4        purrr_1.1.0       
##  [9] readr_2.1.5        tidyr_1.3.1        tibble_3.3.0       ggplot2_4.0.0     
## [13] tidyverse_2.0.0   
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6      jsonlite_2.0.0    compiler_4.4.2    tidyselect_1.2.1 
##  [5] jquerylib_0.1.4   scales_1.4.0      yaml_2.3.10       fastmap_1.2.0    
##  [9] R6_2.6.1          labeling_0.4.3    generics_0.1.4    knitr_1.50       
## [13] bslib_0.9.0       pillar_1.11.1     tzdb_0.5.0        rlang_1.1.6      
## [17] stringi_1.8.7     cachem_1.1.0      xfun_0.53         sass_0.4.10      
## [21] S7_0.2.0          timechange_0.3.0  cli_3.6.5         withr_3.0.2      
## [25] magrittr_2.0.4    digest_0.6.37     grid_4.4.2        rstudioapi_0.17.1
## [29] hms_1.1.3         lifecycle_1.0.4   vctrs_0.6.5       evaluate_1.0.5   
## [33] glue_1.8.0        farver_2.1.2      rmarkdown_2.29    tools_4.4.2      
## [37] pkgconfig_2.0.3   htmltools_0.5.8.1