目标:从空白坐标轴到要素齐全的成品散点图(背景分区 + 点 + 线性拟合 + 相关系数 + 标签)。代码基于
ggplot2、ggpubr与ggrepel,并复用同一套美学映射。
# 如本机尚未安装,可取消注释安装
# install.packages(c("ggplot2","ggpubr","ggrepel","dplyr","tidyr"))
library(ggplot2)
library(ggpubr)   # stat_cor
library(ggrepel)  # 标签不重叠
library(dplyr)
library(tidyr)
library(grid)     # unit() 在 ggrepel 的 padding 中会用到
set.seed(1001)
构造一个泛癌示例数据框 Radio,包含: -
NES:炎症反应的标准化富集分数(横轴,范围约 -3 到
3);
- Radio:激活 B 细胞的打分比(纵轴,范围约 0.75 到
1.75);
- Cancer:肿瘤类型缩写。
为保证正相关,我们让
Radio = 1 + 0.12*NES + 噪声,并裁剪到 0.75–1.75。
cancers <- c(
  "ACC","BLCA","BRCA","CESC","CHOL","COAD","DLBC","ESCA","GBM","HNSC",
  "KICH","KIRC","KIRP","LGG","LIHC","LUAD","LUSC","MESO","OV","PAAD",
  "PCPG","PRAD","READ","SARC","SKCM","STAD","TGCT","THCA","THYM","UCEC",
  "UCS","UVM","KIPAN","COADREAD","STES"
)
n <- length(cancers)
NES <- pmax(-3, pmin(3, rnorm(n, mean = 1.0, sd = 1.8)))
RadioScore <- 1 + 0.12 * NES + rnorm(n, sd = 0.18)
RadioScore <- pmax(0.75, pmin(1.75, RadioScore))
Radio <- data.frame(
  Radio  = round(RadioScore, 3),
  NES    = round(NES, 3),
  Cancer = cancers
)
# 预览与保存(供外部复用)
head(Radio)
write.csv(Radio, "radio_demo.csv", row.names = FALSE)
theme_bw()。annotate("rect", ...)
放两块分类背景,避免遮挡点和线。geom_point() +
geom_smooth(method = "lm");stat_cor()
输出相关系数/显著性。geom_label_repel() 避免重叠;合理的
force/box.padding/point.padding。p1 <- ggplot(data = Radio, aes(x = NES, y = Radio)) +
  scale_x_continuous(limits = c(-3, 3), breaks = seq(-3, 3, 1)) +
  scale_y_continuous(limits = c(0.75, 1.75), breaks = seq(0.75, 1.75, 0.25)) +
  labs(x = "Inflammatory reponse (NES)",
       y = "Score Ratio of Activated B cell",
       title = NULL) +
  theme_bw(base_size = 22)
p1
p2 <- p1 +
  annotate("rect", xmin = -Inf, xmax = 0,  ymin = -Inf, ymax = 1,
           fill = "#EEF0FA", alpha = 0.9) +  # 柔和靛蓝
  annotate("rect", xmin = 0,   xmax = Inf, ymin = 1,    ymax = Inf,
           fill = "#EAF6F3", alpha = 0.9)    # 柔和青绿
p2
这里使用 Spearman 相关,
stat_cor的标签格式与示例一致。
p3 <- p2 +
  geom_point(size = 4, color = "#0072B2") +   # Okabe–Ito 蓝
  geom_smooth(method = "lm", formula = y ~ x, se = TRUE,
              size = 2, show.legend = FALSE,
              color = "#D55E00",              # Okabe–Ito 橙(拟合线)
              fill  = "#F3CDB2") +            # 置信带浅橙米色
  stat_cor(aes(label = paste(..r.label.., ..p.label.., sep = "~`,`~")),
           method = "spearman",
           label.x.npc = "left", label.y.npc = "top", size = 8,
           show.legend = FALSE)
p3
p4 <- p3 +
  geom_label_repel(aes(label = Cancer),
                   seed = 1000, color = "black", show.legend = FALSE,
                   min.segment.length = 0.1,      # 设为 Inf 可去掉引导线
                   force = 2,                      # 标签间排斥力
                   force_pull = 1,                 # 标签与点的吸引力
                   size = 5,
                   box.padding = unit(0.6, "lines"),
                   point.padding = unit(0.5, "lines"),
                   max.overlaps = Inf)
p4
ggsave("scatter_full_features.png", p4, width = 7.2, height = 6.2, dpi = 300)
stat_cor 一键放置;geom_label_repel,避免重叠并可读;theme_bw(base_size = 22),字号在论文级别可读;set.seed()
+ 数据保存为 radio_demo.csv。