目标:从空白坐标轴到要素齐全的成品散点图(背景分区 + 点 + 线性拟合 + 相关系数 + 标签)。代码基于 ggplot2ggpubrggrepel,并复用同一套美学映射。

1. 准备工作

# 如本机尚未安装,可取消注释安装
# install.packages(c("ggplot2","ggpubr","ggrepel","dplyr","tidyr"))

library(ggplot2)
library(ggpubr)   # stat_cor
library(ggrepel)  # 标签不重叠
library(dplyr)
library(tidyr)
library(grid)     # unit() 在 ggrepel 的 padding 中会用到

set.seed(1001)

1.1 编一个原始数据(可复现)

构造一个泛癌示例数据框 Radio,包含: - NES:炎症反应的标准化富集分数(横轴,范围约 -3 到 3);
- Radio:激活 B 细胞的打分比(纵轴,范围约 0.75 到 1.75);
- Cancer:肿瘤类型缩写。

为保证正相关,我们让 Radio = 1 + 0.12*NES + 噪声,并裁剪到 0.75–1.75。

cancers <- c(
  "ACC","BLCA","BRCA","CESC","CHOL","COAD","DLBC","ESCA","GBM","HNSC",
  "KICH","KIRC","KIRP","LGG","LIHC","LUAD","LUSC","MESO","OV","PAAD",
  "PCPG","PRAD","READ","SARC","SKCM","STAD","TGCT","THCA","THYM","UCEC",
  "UCS","UVM","KIPAN","COADREAD","STES"
)

n <- length(cancers)
NES <- pmax(-3, pmin(3, rnorm(n, mean = 1.0, sd = 1.8)))
RadioScore <- 1 + 0.12 * NES + rnorm(n, sd = 0.18)
RadioScore <- pmax(0.75, pmin(1.75, RadioScore))

Radio <- data.frame(
  Radio  = round(RadioScore, 3),
  NES    = round(NES, 3),
  Cancer = cancers
)

# 预览与保存(供外部复用)
head(Radio)
write.csv(Radio, "radio_demo.csv", row.names = FALSE)

2. 作图思路(速记)

  1. 空图打底:固定坐标范围与刻度;主题统一到 theme_bw()
  2. 先放阴影:用 annotate("rect", ...) 放两块分类背景,避免遮挡点和线。
  3. 加点与拟合:geom_point() + geom_smooth(method = "lm")stat_cor() 输出相关系数/显著性。
  4. 加标签:geom_label_repel() 避免重叠;合理的 force/box.padding/point.padding

3. 代码分步(p1 → p4)

3.1 p1:生成一个空图(只有坐标轴)

p1 <- ggplot(data = Radio, aes(x = NES, y = Radio)) +
  scale_x_continuous(limits = c(-3, 3), breaks = seq(-3, 3, 1)) +
  scale_y_continuous(limits = c(0.75, 1.75), breaks = seq(0.75, 1.75, 0.25)) +
  labs(x = "Inflammatory reponse (NES)",
       y = "Score Ratio of Activated B cell",
       title = NULL) +
  theme_bw(base_size = 22)

p1

3.2 p2:添加分类背景图层(两块阴影)

p2 <- p1 +
  annotate("rect", xmin = -Inf, xmax = 0,  ymin = -Inf, ymax = 1,
           fill = "#EEF0FA", alpha = 0.9) +  # 柔和靛蓝
  annotate("rect", xmin = 0,   xmax = Inf, ymin = 1,    ymax = Inf,
           fill = "#EAF6F3", alpha = 0.9)    # 柔和青绿

p2

3.3 p3:添加散点、拟合线与相关系数

这里使用 Spearman 相关,stat_cor 的标签格式与示例一致。

p3 <- p2 +
  geom_point(size = 4, color = "#0072B2") +   # Okabe–Ito 蓝
  geom_smooth(method = "lm", formula = y ~ x, se = TRUE,
              size = 2, show.legend = FALSE,
              color = "#D55E00",              # Okabe–Ito 橙(拟合线)
              fill  = "#F3CDB2") +            # 置信带浅橙米色
  stat_cor(aes(label = paste(..r.label.., ..p.label.., sep = "~`,`~")),
           method = "spearman",
           label.x.npc = "left", label.y.npc = "top", size = 8,
           show.legend = FALSE)

p3

3.4 p4:添加癌症类型标签(不重叠)

p4 <- p3 +
  geom_label_repel(aes(label = Cancer),
                   seed = 1000, color = "black", show.legend = FALSE,
                   min.segment.length = 0.1,      # 设为 Inf 可去掉引导线
                   force = 2,                      # 标签间排斥力
                   force_pull = 1,                 # 标签与点的吸引力
                   size = 5,
                   box.padding = unit(0.6, "lines"),
                   point.padding = unit(0.5, "lines"),
                   max.overlaps = Inf)

p4

4. 导出成品图

ggsave("scatter_full_features.png", p4, width = 7.2, height = 6.2, dpi = 300)

5. 小结(Checklist)