course_demo_ggplot

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
sample_data |> glimpse() 
Rows: 1,000
Columns: 4
$ job       <chr> "Salesperson", "Office Worker", "Engineer", "Office Worker",…
$ age       <dbl> 28, 25, 36, 34, 28, 29, 38, 30, 34, 35, 33, 26, 31, 34, 26, …
$ education <fct> Bachelor's Degree, Bachelor's Degree, Bachelor's Degree, Bac…
$ income    <dbl> 77971.7, 43284.3, 81532.6, 52892.4, 50579.5, 56568.3, 44288.…
Note

在ggplot2中,绘图就像搭建一座房子:首先用ggplot(data, aes())打好地基 ——data是建筑材料(数据集),aes()则是设计图纸,把数据变量 “装修” 成图上的元素(比如用age变量做横轴,income做纵轴,education决定颜色分组),再加上各种geom。

单变量:分类

柱状图

ggplot(data = sample_data,
       mapping = aes(x = education)
) + 
  geom_bar()

单变量:数值

直方图

ggplot(data = sample_data,
       mapping = aes(x = age)
) +
  geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

密度图

ggplot(data = sample_data,
       mapping = aes(x = age)
) +
  geom_density()

箱线图

ggplot(data = sample_data,
       mapping = aes(x = age)
) +
  geom_boxplot()

双变量:数值 + 数值

散点图

ggplot(data = sample_data,
       mapping = aes(x = age, y = income)
) +
  geom_point()

折线图

ggplot(data = sample_data,
       mapping = aes(x = age, y = income)
) +
  geom_line()

拟合曲线图

ggplot(data = sample_data,
       mapping = aes(x = age, y = income)
) +
  geom_smooth()
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

双变量:分类 + 数值

箱线图

ggplot(data = sample_data,
       mapping = aes(x = education, y = income)
) +
  geom_boxplot()

小提琴图

ggplot(data = sample_data,
       mapping = aes(x = education, y = income)
) +
  geom_violin()

基础上使用”fill”/“color”增加分类变量

双变量:分类 + 分类

柱状图 + “fill”

ggplot(data = sample_data,
       mapping = aes(x = education, fill = job)
) + 
  geom_bar()

三变量:数值 + 数值 + 分类

散点图 + “color”

ggplot(data = sample_data,
       mapping = aes(x = age, y = income, color = education)
) + 
  geom_point()

拟合曲线图 + “color”

ggplot(data = sample_data,
       mapping = aes(x = age, y = income, color = education)
) + 
  geom_smooth()
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

双变量:数值 + 分类

密度图 + “color”

ggplot(data = sample_data,
       mapping = aes(x = age, color = job)
) +
  geom_density()

密度图 + “fill”

ggplot(data = sample_data,
       mapping = aes(x = age, fill = job)
) +
  geom_density(alpha = 0.5) # 不透明度0.5