R语言绘图：条形图、直方图、密度曲线、盒形图、小提琴图、分组条形图

Author

蟹蟹喵

Published

June 6, 2026

1. 本节学习目标

本节学习 4 类常见图形：

均值条形图
直方图 + 核密度曲线
盒形图 + 小提琴图
分组条形图

主要使用两个包：

ggplot2
dplyr

其中：

ggplot2 用来画图；
dplyr 用来整理数据、计算均值、计数等。

2. 安装和加载包

如果还没有安装，可以先运行：

install.packages("ggplot2")
install.packages("dplyr")

加载包：

library(ggplot2)
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

3. 创建示例数据

我们创建一份学生成绩数据。

set.seed(123)

student <- data.frame(
  id = 1:120,
  class = rep(c("一班", "二班", "三班"), each = 40),
  gender = sample(c("男", "女"), 120, replace = TRUE),
  group = sample(c("实验组", "对照组"), 120, replace = TRUE),
  score = c(
    rnorm(40, mean = 78, sd = 8),
    rnorm(40, mean = 84, sd = 7),
    rnorm(40, mean = 88, sd = 6)
  )
)

student$score <- round(student$score, 1)

student$class <- factor(
  student$class,
  levels = c("一班", "二班", "三班"),
  ordered = TRUE
)

head(student,5)

  id class gender  group score
1  1  一班     男 实验组  78.9
2  2  一班     男 对照组  70.4
3  3  一班     男 实验组  74.1
4  4  一班     女 实验组  76.0
5  5  一班     男 实验组  92.8

查看数据结构：

str(student)

'data.frame':   120 obs. of  5 variables:
 $ id    : int  1 2 3 4 5 6 7 8 9 10 ...
 $ class : Ord.factor w/ 3 levels "一班"<"二班"<..: 1 1 1 1 1 1 1 1 1 1 ...
 $ gender: chr  "男" "男" "男" "女" ...
 $ group : chr  "实验组" "对照组" "实验组" "实验组" ...
 $ score : num  78.9 70.4 74.1 76 92.8 72.8 79.9 78.6 70.3 77.4 ...

简单查看每个班人数：

table(student$class)


一班 二班 三班 
  40   40   40

查看性别人数：

table(student$gender)


男 女 
65 55

4. ggplot2 的基本语法

ggplot2 的基本结构是：

ggplot(data = 数据框, aes(x = 横轴变量, y = 纵轴变量)) +
  geom_图形类型()

例如：

ggplot(data = student, aes(x = class, y = score)) +
  geom_boxplot()

其中：

部分	含义
`ggplot()`	创建一个图
`data = student`	使用 student 数据
`aes()`	设置横轴、纵轴、颜色、分组等映射
`geom_boxplot()`	画盒形图
`+`	继续添加图层

注意：

ggplot2 里面添加图层使用 +，不是 %>%。

5. 均值条形图

5.1 什么是均值条形图？

均值条形图用于比较不同组的平均值。

比如：

比较一班、二班、三班的平均成绩。

它的横轴通常是分组变量，纵轴是均值。

5.2 先计算每组均值

画均值条形图之前，推荐先把每组均值算出来。

class_mean <- student %>%
  group_by(class) %>%
  summarise(
    mean_score = mean(score),
    sd_score = sd(score),
    n = n(),
    .groups = "drop"
  )

class_mean

# A tibble: 3 × 4
  class mean_score sd_score     n
  <ord>      <dbl>    <dbl> <int>
1 一班        77.2     8.39    40
2 二班        84.1     6.97    40
3 三班        88.1     5.55    40

解释：

代码	含义
`group_by(class)`	按班级分组
`summarise()`	每组汇总
`mean(score)`	计算平均分
`sd(score)`	计算标准差
`n()`	计算人数

5.3 画最基础的均值条形图

ggplot(class_mean, aes(x = class, y = mean_score)) +
  geom_col()

解释：

geom_col()

表示：

直接使用数据中的 y 值画柱子。

这里 mean_score 已经算好了，所以用 geom_col()。

5.4 美化均值条形图

ggplot(class_mean, aes(x = class, y = mean_score, fill = class)) +
  geom_col(width = 0.6) +
  labs(
    title = "不同班级的平均成绩",
    x = "班级",
    y = "平均成绩"
  ) +
  theme_minimal()

解释：

参数	含义
`fill = class`	按班级填充颜色
`width = 0.6`	柱子的宽度
`labs()`	设置标题和坐标轴名称
`theme_minimal()`	使用简洁主题

5.5 添加均值标签

ggplot(class_mean, aes(x = class, y = mean_score, fill = class)) +
  geom_col(width = 0.6) +
  geom_text(
    aes(label = round(mean_score, 1)),
    vjust = -0.5
  ) +
  labs(
    title = "不同班级的平均成绩",
    x = "班级",
    y = "平均成绩"
  ) +
  ylim(0, 100) +
  theme_minimal()

解释：

geom_text()

用来添加文字标签。

vjust = -0.5

表示标签放在柱子上方。

5.6 添加误差线

误差线常用来表示标准差或标准误。

这里先计算标准误：

class_mean <- class_mean %>%
  mutate(
    se_score = sd_score / sqrt(n)
  )

class_mean

# A tibble: 3 × 5
  class mean_score sd_score     n se_score
  <ord>      <dbl>    <dbl> <int>    <dbl>
1 一班        77.2     8.39    40    1.33 
2 二班        84.1     6.97    40    1.10 
3 三班        88.1     5.55    40    0.878

画均值条形图 + 标准误误差线：

ggplot(class_mean, aes(x = class, y = mean_score, fill = class)) +
  geom_col(width = 0.6) +
  geom_errorbar(
    aes(
      ymin = mean_score - se_score,
      ymax = mean_score + se_score
    ),
    width = 0.2
  ) +
  geom_text(
    aes(label = round(mean_score, 1)),
    vjust = -1
  ) +
  labs(
    title = "不同班级的平均成绩：均值 ± 标准误",
    x = "班级",
    y = "平均成绩"
  ) +
  ylim(0, 100) +
  theme_minimal()

解释：

geom_errorbar()

表示添加误差线。

ymin = mean_score - se_score
ymax = mean_score + se_score

表示误差线的下限和上限。

6. 直方图 + 核密度曲线

6.1 什么是直方图？

直方图用于查看连续变量的分布。

比如：

查看学生成绩主要集中在哪个范围。

横轴是分数区间，纵轴是人数或密度。

6.2 基础直方图

ggplot(student, aes(x = score)) +
  geom_histogram()

`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

这会画出最基础的直方图。

但默认分箱可能不太好看，所以通常会设置：

binwidth

6.3 设置 binwidth

ggplot(student, aes(x = score)) +
  geom_histogram(
    binwidth = 5,
    color = "white",
    fill = "skyblue"
  ) +
  labs(
    title = "学生成绩直方图",
    x = "成绩",
    y = "人数"
  ) +
  theme_minimal()

解释：

参数	含义
`binwidth = 5`	每 5 分一个区间
`color = "white"`	柱子边框颜色
`fill = "skyblue"`	柱子填充颜色

6.4 什么是核密度曲线？

核密度曲线也是用来看连续变量分布的。

它比直方图更平滑。

6.5 单独画核密度曲线

ggplot(student, aes(x = score)) +
  geom_density(
    color = "red",
    linewidth = 1
  ) +
  labs(
    title = "学生成绩核密度曲线",
    x = "成绩",
    y = "密度"
  ) +
  theme_minimal()

6.6 直方图 + 核密度曲线

如果想把直方图和核密度曲线放在一张图里，需要注意：

直方图默认纵轴是人数；
密度曲线默认纵轴是密度。

所以要把直方图纵轴也改成密度：

aes(y = after_stat(density))

完整代码：

ggplot(student, aes(x = score)) +
  geom_histogram(
    aes(y = after_stat(density)),
    binwidth = 5,
    color = "white",
    fill = "skyblue",
    alpha = 0.6
  ) +
  geom_density(
    color = "red",
    linewidth = 1
  ) +
  labs(
    title = "学生成绩分布：直方图 + 核密度曲线",
    x = "成绩",
    y = "密度"
  ) +
  theme_minimal()

解释：

代码	含义
`after_stat(density)`	把直方图高度转换成密度
`alpha = 0.6`	设置透明度
`geom_density()`	添加核密度曲线

6.7 按班级画不同颜色的密度曲线

ggplot(student, aes(x = score, color = class)) +
  geom_density(linewidth = 1) +
  labs(
    title = "不同班级成绩的核密度曲线",
    x = "成绩",
    y = "密度",
    color = "班级"
  ) +
  theme_minimal()

6.8 按班级分别画直方图

ggplot(student, aes(x = score, fill = class)) +
  geom_histogram(
    binwidth = 5,
    color = "white",
    alpha = 0.7
  ) +
  facet_wrap(~ class) +
  labs(
    title = "不同班级成绩直方图",
    x = "成绩",
    y = "人数"
  ) +
  theme_minimal()

解释：

facet_wrap(~ class)

表示按照班级分面，每个班单独一张小图。

7. 盒形图

7.1 什么是盒形图？

盒形图用于比较不同组的分布。

它可以显示：

中位数；
四分位数；
数据范围；
异常值。

盒形图特别适合回答：

哪个组的成绩更高？哪个组的成绩波动更大？有没有异常值？

7.2 基础盒形图

ggplot(student, aes(x = class, y = score)) +
  geom_boxplot()

7.3 美化盒形图

ggplot(student, aes(x = class, y = score, fill = class)) +
  geom_boxplot(width = 0.6, alpha = 0.7) +
  labs(
    title = "不同班级成绩盒形图",
    x = "班级",
    y = "成绩"
  ) +
  theme_minimal()

7.4 盒形图上添加散点

有时候我们希望看到每个学生的真实分数，可以添加散点。

ggplot(student, aes(x = class, y = score, fill = class)) +
  geom_boxplot(width = 0.6, alpha = 0.5, outlier.shape = NA) +
  geom_jitter(
    width = 0.15,
    alpha = 0.5,
    size = 1.5
  ) +
  labs(
    title = "不同班级成绩盒形图 + 散点",
    x = "班级",
    y = "成绩"
  ) +
  theme_minimal()

解释：

代码	含义
`geom_boxplot()`	画盒形图
`geom_jitter()`	添加抖动散点
`outlier.shape = NA`	不单独显示盒形图异常点，避免和散点重复
`width = 0.15`	控制散点左右抖动范围

8. 小提琴图

8.1 什么是小提琴图？

小提琴图可以看作：

盒形图 + 密度分布形状

它可以展示不同组的数据分布形状。

小提琴越宽，说明该位置的数据越集中。

8.2 基础小提琴图

ggplot(student, aes(x = class, y = score)) +
  geom_violin()

8.3 美化小提琴图

ggplot(student, aes(x = class, y = score, fill = class)) +
  geom_violin(alpha = 0.7) +
  labs(
    title = "不同班级成绩小提琴图",
    x = "班级",
    y = "成绩"
  ) +
  theme_minimal()

8.4 小提琴图 + 盒形图

这是很常用的一种画法：

ggplot(student, aes(x = class, y = score, fill = class)) +
  geom_violin(alpha = 0.6, trim = FALSE) +
  geom_boxplot(width = 0.15, fill = "white", outlier.shape = NA) +
  labs(
    title = "不同班级成绩：小提琴图 + 盒形图",
    x = "班级",
    y = "成绩"
  ) +
  theme_minimal()

解释：

代码	含义
`geom_violin()`	画小提琴图
`geom_boxplot(width = 0.15)`	在小提琴图中间加一个窄盒形图
`trim = FALSE`	不截断密度尾部
`fill = "white"`	盒形图填充白色

8.5 小提琴图 + 盒形图 + 散点

ggplot(student, aes(x = class, y = score, fill = class)) +
  geom_violin(alpha = 0.5, trim = FALSE) +
  geom_boxplot(width = 0.15, fill = "white", outlier.shape = NA) +
  geom_jitter(
    width = 0.12,
    alpha = 0.4,
    size = 1.3
  ) +
  labs(
    title = "不同班级成绩：小提琴图 + 盒形图 + 散点",
    x = "班级",
    y = "成绩"
  ) +
  theme_minimal()

9. 按性别比较成绩：盒形图和小提琴图

除了按班级，也可以按性别比较。

ggplot(student, aes(x = gender, y = score, fill = gender)) +
  geom_boxplot(alpha = 0.7) +
  labs(
    title = "不同性别成绩盒形图",
    x = "性别",
    y = "成绩"
  ) +
  theme_minimal()

ggplot(student, aes(x = gender, y = score, fill = gender)) +
  geom_violin(alpha = 0.5, trim = FALSE) +
  geom_boxplot(width = 0.15, fill = "white", outlier.shape = NA) +
  labs(
    title = "不同性别成绩小提琴图 + 盒形图",
    x = "性别",
    y = "成绩"
  ) +
  theme_minimal()

10. 分组条形图

10.1 什么是分组条形图？

分组条形图用于比较两个分类变量组合后的数量。

例如：

每个班级中，男生和女生分别有多少人？

这里有两个分类变量：

class
gender

10.2 先看交叉表

table(student$class, student$gender)

10.3 使用 ggplot 直接画计数条形图

ggplot(student, aes(x = class, fill = gender)) +
  geom_bar(position = "dodge") +
  labs(
    title = "不同班级的性别人数分布",
    x = "班级",
    y = "人数",
    fill = "性别"
  ) +
  theme_minimal()

解释：

代码	含义
`geom_bar()`	自动计数画条形图
`fill = gender`	按性别填充颜色
`position = "dodge"`	分组并排显示

注意：

geom_bar()

会自动统计每组数量。

10.4 堆叠条形图

如果不写 position = "dodge"，默认是堆叠条形图。

ggplot(student, aes(x = class, fill = gender)) +
  geom_bar() +
  labs(
    title = "不同班级的性别人数分布：堆叠条形图",
    x = "班级",
    y = "人数",
    fill = "性别"
  ) +
  theme_minimal()

10.5 百分比堆叠条形图

如果想比较比例，可以用：

position = "fill"

ggplot(student, aes(x = class, fill = gender)) +
  geom_bar(position = "fill") +
  labs(
    title = "不同班级的性别比例分布",
    x = "班级",
    y = "比例",
    fill = "性别"
  ) +
  theme_minimal()

解释：

position = "fill"

表示每个柱子高度都是 1，也就是 100%。

10.6 先计算人数，再画分组条形图

更推荐先计算人数，这样结果更清楚。

class_gender_count <- student %>%
  count(class, gender)

class_gender_count

  class gender  n
1  一班     女 17
2  一班     男 23
3  二班     女 15
4  二班     男 25
5  三班     女 23
6  三班     男 17

画图：

ggplot(class_gender_count, aes(x = class, y = n, fill = gender)) +
  geom_col(position = "dodge", width = 0.7) +
  labs(
    title = "不同班级男女生人数",
    x = "班级",
    y = "人数",
    fill = "性别"
  ) +
  theme_minimal()

解释：

这里用的是：

geom_col()

因为 n 已经提前算好了。

10.7 添加人数标签

ggplot(class_gender_count, aes(x = class, y = n, fill = gender)) +
  geom_col(position = position_dodge(width = 0.7), width = 0.7) +
  geom_text(
    aes(label = n),
    position = position_dodge(width = 0.7),
    vjust = -0.4
  ) +
  labs(
    title = "不同班级男女生人数",
    x = "班级",
    y = "人数",
    fill = "性别"
  ) +
  ylim(0, max(class_gender_count$n) + 5) +
  theme_minimal()

注意：

如果柱子用了：

position_dodge(width = 0.7)

标签也要用相同的 position_dodge(width = 0.7)，否则标签位置可能对不上。

11. 分组均值条形图

除了人数，也可以画：

每个班级中，男生和女生的平均成绩。

11.1 先计算分组均值

class_gender_mean <- student %>%
  group_by(class, gender) %>%
  summarise(
    mean_score = mean(score),
    sd_score = sd(score),
    n = n(),
    se_score = sd_score / sqrt(n),
    .groups = "drop"
  )

class_gender_mean

# A tibble: 6 × 6
  class gender mean_score sd_score     n se_score
  <ord> <chr>       <dbl>    <dbl> <int>    <dbl>
1 一班  女           75.5     7.41    17     1.80
2 一班  男           78.3     9.02    23     1.88
3 二班  女           83.1     6.49    15     1.68
4 二班  男           84.7     7.30    25     1.46
5 三班  女           88.9     5.79    23     1.21
6 三班  男           86.9     5.14    17     1.25

11.2 画分组均值条形图

ggplot(class_gender_mean, aes(x = class, y = mean_score, fill = gender)) +
  geom_col(position = position_dodge(width = 0.7), width = 0.7) +
  geom_errorbar(
    aes(
      ymin = mean_score - se_score,
      ymax = mean_score + se_score
    ),
    position = position_dodge(width = 0.7),
    width = 0.2
  ) +
  geom_text(
    aes(label = round(mean_score, 1)),
    position = position_dodge(width = 0.7),
    vjust = -0.7
  ) +
  labs(
    title = "不同班级、不同性别的平均成绩",
    x = "班级",
    y = "平均成绩",
    fill = "性别"
  ) +
  ylim(0, 100) +
  theme_minimal()

这个图很常见，适合表达：

不同班级中，男女生平均成绩是否不同。

12. 使用颜色主题

可以手动设置颜色。

12.1 手动设置填充颜色

ggplot(class_gender_count, aes(x = class, y = n, fill = gender)) +
  geom_col(position = "dodge", width = 0.7) +
  scale_fill_manual(
    values = c("女" = "#F8766D", "男" = "#00BFC4")
  ) +
  labs(
    title = "不同班级男女生人数",
    x = "班级",
    y = "人数",
    fill = "性别"
  ) +
  theme_minimal()

12.2 去掉图例

如果颜色已经很明显，有时候可以去掉图例。

ggplot(class_mean, aes(x = class, y = mean_score, fill = class)) +
  geom_col(width = 0.6) +
  labs(
    title = "不同班级平均成绩",
    x = "班级",
    y = "平均成绩"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

13. 保存图片

可以用 ggsave() 保存上一张图。

推荐先把图保存成对象：

p1 <- ggplot(class_mean, aes(x = class, y = mean_score, fill = class)) +
  geom_col(width = 0.6) +
  labs(
    title = "不同班级平均成绩",
    x = "班级",
    y = "平均成绩"
  ) +
  theme_minimal()

p1

保存图片：

ggsave(
  filename = "class_mean_bar.png",
  plot = p1,
  width = 6,
  height = 4,
  dpi = 300
)

解释：

参数	含义
`filename`	文件名
`plot`	要保存的图
`width`	图片宽度
`height`	图片高度
`dpi`	分辨率

这里设置 eval=FALSE，避免 Knit 时自动保存图片。

14. 常见错误

14.1 错误：把 geom_bar 和 geom_col 混淆

如果你已经有了 y 值，比如平均分：

mean_score

应该用：

geom_col()

如果你想让 ggplot 自动数人数，应该用：

geom_bar()

对比：

情况	用哪个
自动计数	`geom_bar()`
已经算好均值、人数、比例	`geom_col()`

14.2 错误：直方图和密度曲线纵轴不一致

错误示例：

ggplot(student, aes(x = score)) +
  geom_histogram(binwidth = 5) +
  geom_density(color = "red", linewidth = 1) +
  theme_minimal()

这样直方图纵轴是人数，密度曲线纵轴是密度，二者量纲不一致。

正确写法：

ggplot(student, aes(x = score)) +
  geom_histogram(
    aes(y = after_stat(density)),
    binwidth = 5,
    fill = "skyblue",
    color = "white",
    alpha = 0.6
  ) +
  geom_density(color = "red", linewidth = 1) +
  theme_minimal()

14.3 错误：标签位置和柱子位置对不上

如果柱子用了：

position_dodge(width = 0.7)

文字标签也要用同样的：

position_dodge(width = 0.7)

正确示例：

ggplot(class_gender_count, aes(x = class, y = n, fill = gender)) +
  geom_col(position = position_dodge(width = 0.7), width = 0.7) +
  geom_text(
    aes(label = n),
    position = position_dodge(width = 0.7),
    vjust = -0.4
  ) +
  theme_minimal()

14.4 错误：忘记加载 ggplot2

如果报错：

could not find function "ggplot"

说明没有加载 ggplot2。

解决：

library(ggplot2)

14.5 错误：中文显示乱码

如果图标题中文乱码，通常和系统字体有关。

可以先不用特别处理。如果需要指定字体，可以用额外包，比如 showtext，后面再学。

15. 小练习

下面创建一个商品销售数据：

set.seed(456)

sales <- data.frame(
  id = 1:150,
  region = sample(c("华东", "华北", "华南"), 150, replace = TRUE),
  category = sample(c("食品", "服装", "电子"), 150, replace = TRUE),
  channel = sample(c("线上", "线下"), 150, replace = TRUE),
  amount = round(rnorm(150, mean = 500, sd = 120), 1)
)

sales$region <- factor(
  sales$region,
  levels = c("华东", "华北", "华南"),
  ordered = TRUE
)

head(sales,5)

  id region category channel amount
1  1   华东     食品    线上  588.5
2  2   华东     食品    线下  809.4
3  3   华南     服装    线上  685.6
4  4   华北     电子    线上  410.6
5  5   华东     电子    线上  548.9

请完成：

计算不同地区的平均销售额；
画不同地区平均销售额条形图；
画销售额的直方图 + 核密度曲线；
画不同地区销售额的盒形图；
画不同地区销售额的小提琴图 + 盒形图；
画不同地区、不同渠道的分组条形图；
画不同地区、不同渠道的平均销售额分组条形图。

16. 小练习参考答案

16.1 计算不同地区平均销售额

region_mean <- sales %>%
  group_by(region) %>%
  summarise(
    mean_amount = mean(amount),
    sd_amount = sd(amount),
    n = n(),
    se_amount = sd_amount / sqrt(n),
    .groups = "drop"
  )

region_mean

# A tibble: 3 × 5
  region mean_amount sd_amount     n se_amount
  <ord>        <dbl>     <dbl> <int>     <dbl>
1 华东          516.      135.    60      17.4
2 华北          527.      106.    47      15.5
3 华南          504.      124.    43      18.9

16.2 不同地区平均销售额条形图

ggplot(region_mean, aes(x = region, y = mean_amount, fill = region)) +
  geom_col(width = 0.6) +
  geom_errorbar(
    aes(
      ymin = mean_amount - se_amount,
      ymax = mean_amount + se_amount
    ),
    width = 0.2
  ) +
  geom_text(
    aes(label = round(mean_amount, 1)),
    vjust = -0.7
  ) +
  labs(
    title = "不同地区平均销售额",
    x = "地区",
    y = "平均销售额"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

16.3 销售额直方图 + 核密度曲线

ggplot(sales, aes(x = amount)) +
  geom_histogram(
    aes(y = after_stat(density)),
    binwidth = 50,
    fill = "skyblue",
    color = "white",
    alpha = 0.6
  ) +
  geom_density(
    color = "red",
    linewidth = 1
  ) +
  labs(
    title = "销售额分布：直方图 + 核密度曲线",
    x = "销售额",
    y = "密度"
  ) +
  theme_minimal()

16.4 不同地区销售额盒形图

ggplot(sales, aes(x = region, y = amount, fill = region)) +
  geom_boxplot(alpha = 0.7) +
  labs(
    title = "不同地区销售额盒形图",
    x = "地区",
    y = "销售额"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

16.5 不同地区销售额小提琴图 + 盒形图

ggplot(sales, aes(x = region, y = amount, fill = region)) +
  geom_violin(alpha = 0.5, trim = FALSE) +
  geom_boxplot(width = 0.15, fill = "white", outlier.shape = NA) +
  labs(
    title = "不同地区销售额：小提琴图 + 盒形图",
    x = "地区",
    y = "销售额"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

16.6 不同地区、不同渠道的分组条形图

region_channel_count <- sales %>%
  count(region, channel)

region_channel_count

  region channel  n
1   华东    线上 34
2   华东    线下 26
3   华北    线上 22
4   华北    线下 25
5   华南    线上 18
6   华南    线下 25

ggplot(region_channel_count, aes(x = region, y = n, fill = channel)) +
  geom_col(position = position_dodge(width = 0.7), width = 0.7) +
  geom_text(
    aes(label = n),
    position = position_dodge(width = 0.7),
    vjust = -0.4
  ) +
  labs(
    title = "不同地区、不同渠道的订单数量",
    x = "地区",
    y = "订单数量",
    fill = "渠道"
  ) +
  ylim(0, max(region_channel_count$n) + 5) +
  theme_minimal()

16.7 不同地区、不同渠道的平均销售额分组条形图

region_channel_mean <- sales %>%
  group_by(region, channel) %>%
  summarise(
    mean_amount = mean(amount),
    sd_amount = sd(amount),
    n = n(),
    se_amount = sd_amount / sqrt(n),
    .groups = "drop"
  )

region_channel_mean

# A tibble: 6 × 6
  region channel mean_amount sd_amount     n se_amount
  <ord>  <chr>         <dbl>     <dbl> <int>     <dbl>
1 华东   线上           523.     138.     34      23.7
2 华东   线下           505.     133.     26      26.2
3 华北   线上           508.      98.0    22      20.9
4 华北   线下           544.     112.     25      22.4
5 华南   线上           495.     113.     18      26.7
6 华南   线下           511.     133.     25      26.6

ggplot(region_channel_mean, aes(x = region, y = mean_amount, fill = channel)) +
  geom_col(position = position_dodge(width = 0.7), width = 0.7) +
  geom_errorbar(
    aes(
      ymin = mean_amount - se_amount,
      ymax = mean_amount + se_amount
    ),
    position = position_dodge(width = 0.7),
    width = 0.2
  ) +
  geom_text(
    aes(label = round(mean_amount, 1)),
    position = position_dodge(width = 0.7),
    vjust = -0.7
  ) +
  labs(
    title = "不同地区、不同渠道的平均销售额",
    x = "地区",
    y = "平均销售额",
    fill = "渠道"
  ) +
  theme_minimal()

17. 本节核心代码总结

下面是模板代码，所以设置为 eval=FALSE，只展示，不运行。

# 加载包
library(ggplot2)
library(dplyr)

# 1. 计算分组均值
mean_df <- df %>%
  group_by(group_var) %>%
  summarise(
    mean_y = mean(y),
    sd_y = sd(y),
    n = n(),
    se_y = sd_y / sqrt(n),
    .groups = "drop"
  )

# 2. 均值条形图
ggplot(mean_df, aes(x = group_var, y = mean_y, fill = group_var)) +
  geom_col() +
  geom_errorbar(
    aes(ymin = mean_y - se_y, ymax = mean_y + se_y),
    width = 0.2
  ) +
  theme_minimal()

# 3. 直方图
ggplot(df, aes(x = x_var)) +
  geom_histogram(binwidth = 5)

# 4. 直方图 + 核密度曲线
ggplot(df, aes(x = x_var)) +
  geom_histogram(
    aes(y = after_stat(density)),
    binwidth = 5,
    fill = "skyblue",
    color = "white",
    alpha = 0.6
  ) +
  geom_density(color = "red", linewidth = 1) +
  theme_minimal()

# 5. 盒形图
ggplot(df, aes(x = group_var, y = y_var, fill = group_var)) +
  geom_boxplot() +
  theme_minimal()

# 6. 小提琴图 + 盒形图
ggplot(df, aes(x = group_var, y = y_var, fill = group_var)) +
  geom_violin(alpha = 0.5, trim = FALSE) +
  geom_boxplot(width = 0.15, fill = "white") +
  theme_minimal()

# 7. 自动计数分组条形图
ggplot(df, aes(x = group_var1, fill = group_var2)) +
  geom_bar(position = "dodge") +
  theme_minimal()

# 8. 先计数，再画分组条形图
count_df <- df %>%
  count(group_var1, group_var2)

ggplot(count_df, aes(x = group_var1, y = n, fill = group_var2)) +
  geom_col(position = "dodge") +
  theme_minimal()

# 9. 分组均值条形图
mean_df2 <- df %>%
  group_by(group_var1, group_var2) %>%
  summarise(
    mean_y = mean(y_var),
    se_y = sd(y_var) / sqrt(n()),
    .groups = "drop"
  )

ggplot(mean_df2, aes(x = group_var1, y = mean_y, fill = group_var2)) +
  geom_col(position = position_dodge(width = 0.7), width = 0.7) +
  geom_errorbar(
    aes(ymin = mean_y - se_y, ymax = mean_y + se_y),
    position = position_dodge(width = 0.7),
    width = 0.2
  ) +
  theme_minimal()

18. 一句话总结

这节你最需要记住：

geom_col()

用于已经算好数值的条形图，比如均值条形图。

geom_bar()

用于自动计数条形图。

geom_histogram()

用于直方图。

geom_density()

用于核密度曲线。

geom_boxplot()

用于盒形图。

geom_violin()

用于小提琴图。

最常见组合是：

ggplot(df, aes(x = group, y = value, fill = group)) +
  geom_violin() +
  geom_boxplot(width = 0.15)

以及：

ggplot(df, aes(x = value)) +
  geom_histogram(aes(y = after_stat(density))) +
  geom_density()