1 1. 背景与目标

传统富集气泡图通常同时表达 4 个维度:Description(y 轴)、GeneRatio/count(x 轴)、Input.number(气泡大小)、P.adjust/FDR(颜色)。 本文在气泡图左侧加入两层 桑基/Alluviallayer2 -> layer3),把每个 pathway 中的 命中基因数(Input.number) 映射为流的宽度,从而实现第 5 个维度(“基因-通路归属的流量”)的直观呈现。

2 2. 环境准备

# 如缺包会自动安装(可按需注释)
need <- c("tidyverse","ggalluvial","RColorBrewer","cowplot")
for (p in need) if (!requireNamespace(p, quietly = TRUE)) install.packages(p)
invisible(lapply(need, library, character.only = TRUE))
set.seed(2025)

笔记ggsankey 在部分 R 版本不可用,因此本文统一采用 CRAN 上更稳的 ggalluvial。关键几何对象: - geom_alluvium():画流(可映射 y 为权重/宽度) - geom_stratum():画节点 - geom_text(stat="stratum"):给节点自动加标签

3 3. 数据与字段说明

3.1 3.1 示例数据(可替换为你的富集结果)

grp <- list(
  "1.0 Biosynthesis" =
    c("Prodigiosin biosynthesis","Benzoxazinoid biosynthesis","Zeatin biosynthesis"),
  "1.1 Carbohydrate" =
    c("Glyoxylate and dicarboxylate metabolism","Polyketide sugar unit biosynthesis","Pentose phosphate pathway"),
  "1.2 Energy metabolism" =
    c("Nitrogen metabolism","Sulfur metabolism"),
  "1.3 Lipid metabolism" =
    c("Glycerolipid metabolism","Arachidonic acid metabolism","Linoleic acid metabolism"),
  "1.5 Metabolism" =
    c("Cyanamino acid metabolism","Fluorobenzoate degradation","Naphthalene degradation"),
  "1.7 Glycan" =
    c("Glycosaminoglycan degradation","Glycan biosynthesis (general)"),
  "1.9 Metabolism" =
    c("Limonene and pinene degradation","Polyketide backbone biosynthesis"),
  "1.11 Xenobiotics" =
    c("Metabolism of xenobiotics by cytochrome P450","Chloroalkane and chloroalkene degradation","Toluene degradation")
)

enrich_demo <- purrr::imap_dfr(grp, ~ tibble(layer2 = .y, layer3 = .x)) %>%
  mutate(
    Background.number = sample(seq(160, 420, by = 10), n(), replace = TRUE),
    Input.number      = pmax(3L, round(Background.number * runif(n(), 0.08, 0.30))),
    FDR               = pmin(1, stats::rbeta(n(), 0.8, 3.2))
  )

enrich_demo %>% dplyr::sample_n(5)

字段要求(如换成你自己的数据): - layer2:上层类目(如 KEGG 2 级) - layer3:pathway 名称(右图 y 轴 & 左图右侧节点) - Input.number:命中基因数(控制气泡大小 & 流宽) - Background.number:背景数(用于计算 count = Input/Background) - FDR:校正后的 p 值(控制气泡颜色)

读取你自己的文件示例(制表符分隔):

library(readr)
raw <- readr::read_tsv("DEG.Ko.enrich.xls")
# 选取并重命名为本文所需字段名
df0 <- raw %>% dplyr::select(layer2, layer3, Input.number, Background.number, FDR)

3.2 3.2 指标计算与 TopN 选择

# 计算 count,选前 20 个 pathway(可改 n)
df <- enrich_demo %>%
  mutate(count = Input.number / Background.number) %>%
  arrange(desc(count)) %>%
  slice_head(n = 20)

# 为了让左、右两图对齐:把 layer3 设为“反向因子”(上到下与气泡行一致)
df$layer3 <- factor(df$layer3, levels = df$layer3 %>% unique() %>% rev())
df$layer2 <- factor(df$layer2, levels = df$layer2 %>% unique())

# 预览(前 10 行)
df %>% select(layer2, layer3, Input.number, Background.number, count, FDR) %>% head(10)

4 4. 可视化:气泡图 + Alluvial

4.1 4.1 右侧:富集气泡图

p_bubble <- df %>%
  ggplot(aes(x = count, y = layer3)) +
  geom_point(aes(size = Input.number, color = FDR, fill = FDR), shape = 19) +
  scale_color_gradientn(colours = RColorBrewer::brewer.pal(6, "RdBu")) +
  scale_fill_gradientn(colours = RColorBrewer::brewer.pal(6, "RdBu")) +
  guides(size = guide_legend(title = "Count")) +
  theme_bw() +
  theme(
    axis.title   = element_blank(),
    axis.text.y  = element_blank(),   # y 轴文字交给左侧节点
    axis.ticks.y = element_blank(),
    legend.key   = element_blank(),
    legend.title = element_text(size = 10),
    legend.text  = element_text(size = 8),
    plot.margin  = margin(5.5, 5.5, 5.5, 0)
  )

p_bubble
右:富集气泡图。x=count=Input/Background;气泡大小=Input.number;颜色=FDR (RdBu 渐变)。

右:富集气泡图。x=count=Input/Background;气泡大小=Input.number;颜色=FDR (RdBu 渐变)。

笔记:如果想让 FDR 越小越“蓝”(而不是越红),可用 rev(RColorBrewer::brewer.pal(6, 'RdBu')) 反转配色。

4.2 4.2 左侧:两层 Alluvial(layer2 -> layer3

df_alluv <- df %>%
  group_by(layer3) %>%
  mutate(weight = Input.number / sum(Input.number)) %>%  
  ungroup() %>%
  select(layer2, layer3, weight)

p_alluvial <- ggplot(
  df_alluv,
  aes(y = weight, axis1 = layer2, axis2 = layer3)
) +
  geom_alluvium(aes(fill = layer2), width = 0.18, alpha = 0.85) +
  geom_stratum(width = 0.18, fill = "grey95", color = "grey60") +
  geom_text(stat = "stratum", aes(label = after_stat(stratum)), size = 3, color = "black") +
  scale_fill_manual(values = colorRampPalette(RColorBrewer::brewer.pal(11, "Paired"))(nlevels(df$layer2))) +
  theme_void() +
  theme(legend.position = "none", plot.margin = margin(5.5, 10, 5.5, 5.5))

p_alluvial
左:Alluvial(桑基)图。流宽映射 Input.number;两端节点显示 layer2 与 layer3。

左:Alluvial(桑基)图。流宽映射 Input.number;两端节点显示 layer2 与 layer3。

笔记y=weight 决定“流宽”;这里直接用 Input.number,与右侧气泡大小保持一致语义。

4.3 4.3 拼接为一张图

p_final <- cowplot::plot_grid(p_alluvial, p_bubble, ncol = 2, rel_widths = c(1.9, 1))
p_final
左:Alluvial;右:气泡图。两侧行顺序严格对齐,但存在一定误差,需要借助AI作图工具调节,便于对读。

左:Alluvial;右:气泡图。两侧行顺序严格对齐,但存在一定误差,需要借助AI作图工具调节,便于对读。

# 可选保存
# ggsave("sankey_bubble_combo.png", p_final, width = 12, height = 6, dpi = 300)
# ggsave("sankey_bubble_combo.pdf",  p_final, width = 12, height = 6)

5 5. 结果描述

library(dplyr)
# Top 5 按 count
sum_top <- df %>% arrange(desc(count)) %>%
  transmute(layer2, layer3, Input.number, Background.number,
            count = round(count, 3), FDR = signif(FDR, 3)) %>%
  head(5)
knitr::kable(sum_top, caption = '按 count 排序的前 5 个 pathway(示例数据)。')
按 count 排序的前 5 个 pathway(示例数据)。
layer2 layer3 Input.number Background.number count FDR
1.7 Glycan Glycan biosynthesis (general) 107 360 0.297 0.3400
1.11 Xenobiotics Metabolism of xenobiotics by cytochrome P450 74 260 0.285 0.0827
1.0 Biosynthesis Zeatin biosynthesis 54 190 0.284 0.6260
1.9 Metabolism Limonene and pinene degradation 106 380 0.279 0.1620
1.1 Carbohydrate Pentose phosphate pathway 102 380 0.268 0.0541
# 哪些上层类目贡献的基因最多?
by_l2 <- df %>% group_by(layer2) %>% summarise(total_input = sum(Input.number), .groups='drop') %>% arrange(desc(total_input))
knitr::kable(by_l2, caption = '各 layer2 的命中基因总数 (Top20 子集内)。')
各 layer2 的命中基因总数 (Top20 子集内)。
layer2 total_input
1.1 Carbohydrate 225
1.11 Xenobiotics 198
1.9 Metabolism 177
1.0 Biosynthesis 162
1.7 Glycan 157
1.3 Lipid metabolism 144
1.2 Energy metabolism 101
1.5 Metabolism 87

解读要点:

提示:真实项目中,可把 dfslice_head(n=20) 改成阈值(如 FDR < 0.05)或其它排序逻辑(如取 Input.number 最大的 TopN),以适配你的论文或报告需求。

6 6. 教程与常见改动

  1. 替换为你的数据:按 3.1 中的读取示例,保证字段名对应。
  2. TopN/筛选:在 3.2 调整 slice_head(n = 20) 或加过滤条件(如 filter(FDR < 0.05))。
  3. 颜色
    • FDR 渐变:scale_color_gradientn(colours = rev(brewer.pal(6, 'RdBu'))) 可反转。
    • Alluvial 填充:scale_fill_manual(values = ...) 可换调色板或指定少量命名色。
  4. 顺序对齐(关键):
    • 先在气泡图数据里确定 layer3因子顺序
    • 再把 alluvial 的 axis2 = layer3 使用同一个因子,确保左右严格对齐。
  5. 导出:用 ggsave() 分别导出 PNG/PDF;期刊排版建议 PDF(矢量)。
  6. 可读性:可调 geom_alluvium(..., alpha)geom_stratum(width=...)、标签 size 等参数,或给右图加网格线增强对齐。

7 7. Session info(可用于复现)

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Asia/Shanghai
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] cowplot_1.2.0      RColorBrewer_1.1-3 ggalluvial_0.12.5  lubridate_1.9.4   
##  [5] forcats_1.0.0      stringr_1.5.2      dplyr_1.1.4        purrr_1.1.0       
##  [9] readr_2.1.5        tidyr_1.3.1        tibble_3.3.0       ggplot2_4.0.0     
## [13] tidyverse_2.0.0   
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6      jsonlite_2.0.0    compiler_4.4.2    tidyselect_1.2.1 
##  [5] jquerylib_0.1.4   scales_1.4.0      yaml_2.3.10       fastmap_1.2.0    
##  [9] R6_2.6.1          labeling_0.4.3    generics_0.1.4    knitr_1.50       
## [13] bslib_0.9.0       pillar_1.11.1     tzdb_0.5.0        rlang_1.1.6      
## [17] stringi_1.8.7     cachem_1.1.0      xfun_0.53         sass_0.4.10      
## [21] S7_0.2.0          timechange_0.3.0  cli_3.6.5         withr_3.0.2      
## [25] magrittr_2.0.4    digest_0.6.37     grid_4.4.2        rstudioapi_0.17.1
## [29] hms_1.1.3         lifecycle_1.0.4   vctrs_0.6.5       evaluate_1.0.5   
## [33] glue_1.8.0        farver_2.1.2      rmarkdown_2.29    tools_4.4.2      
## [37] pkgconfig_2.0.3   htmltools_0.5.8.1