完整的ggplot2教程-第1部分| ggplot2简介

在ggplot2学习笔记之简短教程中，我们看到了使用ggplot2软件包制作图表的简短教程。它很快涉及到制作ggplot的各个方面。

现在，这是一个完整而完整的教程。我从头开始，讨论如何构造和自定义几乎所有ggplot图形。它涉及有效且更具视觉吸引力的原理，步骤和细微差别。因此，出于实用目的，我希望本教程可以作为书签参考并很好地使用，这对您日常的绘图工作很有用。

这是ggplot2三部分教程的第1部分，它是R中一种美观的（非常受欢迎）的图形框架。该教程主要针对具有R编程语言与ggplot2基本知识并希望制作复杂且美观图表的用户。

1. 了解ggplot语法

如果您是初学者或主要使用基本图形，则构造ggplots的语法可能会令人困惑。主要区别在于，与基本图形不同，ggplot适用于数据框而不是单个矢量。通常，进行绘图所需的所有数据都包含在提供给ggplot()自身的数据框中，或者可以提供给相应的几何。

第二个值得注意的地方是，您可以通过使用ggplot()函数在现有图上添加更多层（和主题）来继续改善图。

让我们根据中西部数据集初始化一个基本的ggplot。

# Setup
# options(scipen=999)  # turn off scientific notation like 1e+06
library(tidyverse)

## -- Attaching packages ------------------------ tidyverse 1.3.0 --

## √ ggplot2 3.2.1     √ purrr   0.3.3
## √ tibble  2.1.3     √ dplyr   0.8.3
## √ tidyr   1.0.0     √ stringr 1.4.0
## √ readr   1.3.1     √ forcats 0.4.0

## -- Conflicts --------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

data("midwest", package = "ggplot2")  # load the data
# midwest <- read.csv("http://goo.gl/G1K41K") # alt source 

# Init Ggplot
ggplot(midwest, aes(x=area, y=poptotal))  # area and poptotal are columns in 'midwest'

绘制了一个空白ggplot。即使指定了x和y，也没有点或线。这是因为，ggplot并不假定您要绘制散点图或折线图。我只告诉ggplot使用什么数据集以及X和Y轴应使用哪些列。

还要注意，该aes()功能用于指定X和Y轴。这是因为，必须在aes()函数中指定属于源数据帧的任何信息。

2. 如何制作一个简单的散点图

让我们通过使用称为geom层添加点，在空白ggplot上创建散点图geom_point。

midwest %>% 
  head()

## # A tibble: 6 x 28
##     PID county state  area poptotal popdensity popwhite popblack popamerindian
##   <int> <chr>  <chr> <dbl>    <int>      <dbl>    <int>    <int>         <int>
## 1   561 ADAMS  IL    0.052    66090      1271.    63917     1702            98
## 2   562 ALEXA~ IL    0.014    10626       759      7054     3496            19
## 3   563 BOND   IL    0.022    14991       681.    14477      429            35
## 4   564 BOONE  IL    0.017    30806      1812.    29344      127            46
## 5   565 BROWN  IL    0.018     5836       324.     5264      547            14
## 6   566 BUREAU IL    0.05     35688       714.    35157       50            65
## # ... with 19 more variables: popasian <int>, popother <int>, percwhite <dbl>,
## #   percblack <dbl>, percamerindan <dbl>, percasian <dbl>, percother <dbl>,
## #   popadults <int>, perchsd <dbl>, percollege <dbl>, percprof <dbl>,
## #   poppovertyknown <int>, percpovertyknown <dbl>, percbelowpoverty <dbl>,
## #   percchildbelowpovert <dbl>, percadultpoverty <dbl>,
## #   percelderlypoverty <dbl>, inmetro <int>, category <chr>

midwest %>% glimpse()

## Observations: 437
## Variables: 28
## $ PID                  <int> 561, 562, 563, 564, 565, 566, 567, 568, 569, 5...
## $ county               <chr> "ADAMS", "ALEXANDER", "BOND", "BOONE", "BROWN"...
## $ state                <chr> "IL", "IL", "IL", "IL", "IL", "IL", "IL", "IL"...
## $ area                 <dbl> 0.052, 0.014, 0.022, 0.017, 0.018, 0.050, 0.01...
## $ poptotal             <int> 66090, 10626, 14991, 30806, 5836, 35688, 5322,...
## $ popdensity           <dbl> 1270.9615, 759.0000, 681.4091, 1812.1176, 324....
## $ popwhite             <int> 63917, 7054, 14477, 29344, 5264, 35157, 5298, ...
## $ popblack             <int> 1702, 3496, 429, 127, 547, 50, 1, 111, 16, 165...
## $ popamerindian        <int> 98, 19, 35, 46, 14, 65, 8, 30, 8, 331, 51, 26,...
## $ popasian             <int> 249, 48, 16, 150, 5, 195, 15, 61, 23, 8033, 89...
## $ popother             <int> 124, 9, 34, 1139, 6, 221, 0, 84, 6, 1596, 20, ...
## $ percwhite            <dbl> 96.71206, 66.38434, 96.57128, 95.25417, 90.198...
## $ percblack            <dbl> 2.57527614, 32.90043290, 2.86171703, 0.4122573...
## $ percamerindan        <dbl> 0.14828264, 0.17880670, 0.23347342, 0.14932156...
## $ percasian            <dbl> 0.37675897, 0.45172219, 0.10673071, 0.48691813...
## $ percother            <dbl> 0.18762294, 0.08469791, 0.22680275, 3.69733169...
## $ popadults            <int> 43298, 6724, 9669, 19272, 3979, 23444, 3583, 1...
## $ perchsd              <dbl> 75.10740, 59.72635, 69.33499, 75.47219, 68.861...
## $ percollege           <dbl> 19.63139, 11.24331, 17.03382, 17.27895, 14.476...
## $ percprof             <dbl> 4.355859, 2.870315, 4.488572, 4.197800, 3.3676...
## $ poppovertyknown      <int> 63628, 10529, 14235, 30337, 4815, 35107, 5241,...
## $ percpovertyknown     <dbl> 96.27478, 99.08714, 94.95697, 98.47757, 82.505...
## $ percbelowpoverty     <dbl> 13.151443, 32.244278, 12.068844, 7.209019, 13....
## $ percchildbelowpovert <dbl> 18.011717, 45.826514, 14.036061, 11.179536, 13...
## $ percadultpoverty     <dbl> 11.009776, 27.385647, 10.852090, 5.536013, 11....
## $ percelderlypoverty   <dbl> 12.443812, 25.228976, 12.697410, 6.217047, 19....
## $ inmetro              <int> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1...
## $ category             <chr> "AAR", "LHR", "AAR", "ALU", "AAR", "AAR", "LAR...

midwest %>% 
  ggplot(aes(area,poptotal)) +
  geom_point()
midwest %>% 
  filter(poptotal<1e+06) %>% 
  ggplot(aes(area,poptotal)) +
  geom_point()

我们得到了一个基本的散点图，其中每个点代表一个县。但是，它缺少一些基本组成部分，例如绘图标题，有意义的轴标签等。此外，大多数点都集中在绘图的底部，这不太好。您将在接下来的步骤中看到如何纠正这些问题。

像geom_point()一样，有许多此类geom图层，我们将在本教程系列的后续部分中看到它们。现在，让我们使用来添加平滑层geom_smooth(method='lm')。由于method设置为lm（线性模型的简称），因此它绘制了最佳拟合线。

g <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point() + 
  geom_smooth(method="lm")  # set se=FALSE to turnoff confidence bands

plot(g)

拟合的线是蓝色。您能找到其他可用的选项method吗？（注意：请参阅?geom_smooth）。您可能已经注意到，大多数点位于图表的底部，看起来并不好看。因此，让我们更改Y轴限制以关注下半部分。

3. 调整X和Y轴限制

g <- ggplot(midwest, aes(x=area, y=poptotal)) +
  geom_point() +
  geom_smooth(method="lm")  # set se=FALSE to turnoff confidence bands
g
# Delete the points outside the limits
g + xlim(c(0, 0.1)) + ylim(c(0, 1000000))   # deletes points

## Warning: Removed 5 rows containing non-finite values (stat_smooth).

## Warning: Removed 5 rows containing missing values (geom_point).

# g + xlim(0, 0.1) + ylim(0, 1000000)   # deletes points
g + coord_cartesian(ylim = c(0,1000000))

在这种情况下，图表不是从头开始构建的，而是建立在g之上。这是因为先前的图以g对象存储为ggplot，该对象在被调用时将重现原始图。使用ggplot，您可以在该图的顶部添加更多的图层，主题和其他设置。

您是否注意到最佳拟合线与原始图相比变得更加水平？这是因为，当使用xlim()和时ylim()，指定范围之外的点将被删除，并且在绘制最佳拟合线（使用geom_smooth(method='lm')）时将不考虑这些点。当您希望了解移除某些极端值（或离群值）时最佳拟合线将如何变化时，此函数可能会派上用场。

另一种方法是通过放大感兴趣区域而不删除点来更改X和Y轴限制。这是使用完成的coord_cartesian()。

g <- ggplot(midwest, aes(x=area, y=poptotal)) +
  geom_point() +
  geom_smooth(method="lm")  # set se=FALSE to turnoff confidence bands

# Zoom in without deleting the points outside the limits. 
# As a result, the line of best fit is the same as the original plot.
g1 <- g + coord_cartesian(xlim=c(0,0.1), ylim=c(0, 1000000))  # zooms in
plot(g1)  # 图只是范围缩小了，拟合线没有发生变化

由于考虑了所有要点，因此最佳拟合线没有改变。

4. 如何更改标题和轴标签

我将其存储为g1。让我们为X和Y轴添加绘图标题和标签。这可以一次性使用来完成labs()与功能title，x和y参数。另一种选择是使用ggtitle()，xlab()和ylab()。

g <- ggplot(midwest, aes(x=area, y=poptotal)) +
  geom_point() + 
  geom_smooth(method="lm")  # set se=FALSE to turnoff confidence bands
g
g1 <- g + coord_cartesian(xlim=c(0,0.1), ylim=c(0, 1000000))  # zooms in

# Add Title and Labels
g1 + labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")

# or

g1 + ggtitle("Area Vs Population", subtitle="From midwest dataset") + xlab("Area") + ylab("Population")

优秀！因此，这是完整功能调用。

# Full Plot call
library(ggplot2)
ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point() + 
  geom_smooth(method="lm") + 
  coord_cartesian(xlim=c(0,0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics") +
  theme(plot.title = element_text(size = 20,hjust = 0.5),
        axis.title = element_text(size = 15),
        axis.text = element_text(size = 10))

5. 如何更改点的颜色和大小

我们可以通过修改相应的几何图形来改变几何图形图层的美感.

ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point(col="steelblue", size=3) +   # Set static color and size for points  
  geom_smooth(method="lm", col="firebrick") +  # change the color of line
  coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics") +
  theme(plot.title = element_text(size = 20,hjust = 0.5),
        axis.title = element_text(size = 15),
        axis.text = element_text(size = 10))

如何更改颜色以在另一列中反映类别？

假设我们要根据源数据集中的另一列更改颜色（midwest），则必须在aes()函数内指定颜色。

gg <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point(aes(col=state), size=3) +  # Set color to vary based on state categories.
  geom_smooth(method="lm", col="firebrick", size=2) + 
  coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics") +
  theme(plot.title = element_text(size = 20,hjust = 0.5),
        axis.title = element_text(size = 15),
        axis.text = element_text(size = 10))

plot(gg)

现在，每个点都基于state所属的点而上色，因为aes(col=state)。不只是颜色，但是size，shape，stroke（的边界的厚度）和fill（填充颜色）可以用来判别分组。

另外，图例会自动添加。如果需要，可以通过在功能中将设置为legend.position来将其删除。

gg + theme(legend.position="None")  # remove legend

另外，您可以完全更改调色板。

gg + scale_color_brewer(palette = "Set1")

在RColorBrewer软件包中可以找到更多这样的调色板

library(RColorBrewer)
head(brewer.pal.info, 10)  # show 10 palettes

##          maxcolors category colorblind
## BrBG            11      div       TRUE
## PiYG            11      div       TRUE
## PRGn            11      div       TRUE
## PuOr            11      div       TRUE
## RdBu            11      div       TRUE
## RdGy            11      div      FALSE
## RdYlBu          11      div       TRUE
## RdYlGn          11      div      FALSE
## Spectral        11      div      FALSE
## Accent           8     qual      FALSE

knitr::include_graphics("color_palettes.png",error = FALSE)

6. 如何更改X轴文本和刻度的位置

6.1 如何更改X和Y轴文本及其位置？

好了，现在让我们看看如何更改X和Y轴文本及其位置。这涉及两个方面：breaks和labels。

第1步：设置breaks。breaks应与X轴变量的比例相同。scale_x_continuous是连续变量。如果它是一个日期变量，则scale_x_date可以使用。像scale_x_continuous(),scale_y_continuous()可用于Y轴。

# Base plot
gg <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point(aes(col=state), size=3) +  # Set color to vary based on state categories.
  geom_smooth(method="lm", col="firebrick", size=2) + 
  coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics") +
  theme(plot.title = element_text(size = 20,hjust = 0.5),
        axis.title = element_text(size = 15),
        axis.text = element_text(size = 10))
gg
# Change breaks
gg + scale_x_continuous(breaks=seq(0, 0.1, 0.01))  # 设置步长0.01

第2步：更改labels可以更改labels轴刻度。labels取与长度相同的向量breaks。让我通过将labels设置为从a到k的字母来进行演示（尽管在这种情况下它没有任何意义）。

# Base Plot
gg <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point(aes(col=state), size=3) +  # Set color to vary based on state categories.
  geom_smooth(method="lm", col="firebrick", size=2) + 
  coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population",
       subtitle="From midwest dataset", 
       y="Population", 
       x="Area",
       caption="Midwest Demographics") +
  theme(plot.title = element_text(size = 20,hjust = 0.5),
        axis.title = element_text(size = 15),
        axis.text = element_text(size = 10))

# Change breaks + label
gg + scale_x_continuous(breaks=seq(0, 0.1, 0.01), labels = letters[1:11])  # 有labels必须要有breaks!!!

如果需要反转刻度，请使用scale_x_reverse()。

gg <- ggplot(midwest, aes(x=area, y=poptotal)) +    # X 和 Y
  geom_point(aes(col=state), size=3) +  # Set color to vary based on state categories.颜色，大小  散点图
  geom_smooth(method="lm", col="firebrick", size=2) + # 拟合曲线图
  coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) +  # 不改变曲线图
  labs(title="Area Vs Population",
       subtitle="From midwest dataset", 
       y="Population", 
       x="Area",
       caption="Midwest Demographics") +
  theme(plot.title = element_text(size = 20,hjust = 0.5),  
        axis.title = element_text(size = 15),
        axis.text = element_text(size = 10))
gg + scale_x_reverse()

6.2 如何通过设置原始值的格式为轴标签编写自定义文本？

让我们也使用breaks为Y轴设置文本，并设置X和Y轴标签的格式。我使用了2种格式化标签的方法：

方法1：使用sprintf()。（在下面的示例中，将其格式化为％）
方法2：使用自定义的用户定义函数。（将1000格式化为1K比例）

使用任何觉得方便的方法。

# Base Plot
gg <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point(aes(col=state), size=3) +  # Set color to vary based on state categories. 颜色
  geom_smooth(method="lm", col="firebrick", size=2) + 
  coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population",
       subtitle="From midwest dataset",
       y="Population", 
       x="Area", 
       caption="Midwest Demographics") +
  theme(plot.title = element_text(size = 20,hjust = 0.5),  
        axis.title = element_text(size = 15),
        axis.text = element_text(size = 10))
gg
# Change Axis Texts
gg + 
  scale_x_continuous(breaks=seq(0, 0.1, 0.01), labels = sprintf("%1.2f%%", seq(0, 0.1, 0.01))) + 
  scale_y_continuous(breaks=seq(0, 1000000, 200000), labels = function(x){paste0(x/1000, 'K')})

gg + 
  scale_x_continuous(breaks=seq(0, 0.1, 0.01), labels = stringr::str_c(seq(0,0.1,0.01),"%")) + 
  scale_y_continuous(breaks=seq(0, 1000000, 200000), labels = function(x){stringr::str_c(x/1000, 'K')})

6.3 如何使用预建主题自定义整个主题？

最后，我们可以使用预先构建的主题来更改整个主题本身，而不是单独更改主题组件（我将在第2部分中详细讨论）。帮助页面?theme_bw显示了所有可用的内置主题。

在绘制ggplot之前，使用theme_set（）设置主题。请注意，此设置将影响所有将来的绘图。或者，绘制ggplot，然后添加整体主题设置（例如theme_bw()）

# Base plot
gg <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point(aes(col=state), size=3) +  # Set color to vary based on state categories.
  geom_smooth(method="lm", col="firebrick", size=2) + 
  coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population",
       subtitle="From midwest dataset",
       y="Population",
       x="Area", 
       caption="Midwest Demographics") +
  theme(plot.title = element_text(size = 20,hjust = 0.5),  
        axis.title = element_text(size = 15),
        axis.text = element_text(size = 10))

gg + ggthemes::theme_economist() +
  theme(plot.title = element_text(size = 20,hjust = 0.5),  
        axis.title = element_text(size = 15),
        axis.text = element_text(size = 10))

gg + 
  ggthemes::theme_economist_white() +
  theme(plot.title = element_text(size = 20,hjust = 0.5),  
        axis.title = element_text(size = 15),
        axis.text = element_text(size = 10))

gg <- gg + scale_x_continuous(breaks=seq(0, 0.1, 0.01))

# method 1: Using theme_set()
# theme_set(theme_classic())  # not run
# gg

# method 2: Adding theme Layer itself.
gg + theme_bw() + labs(subtitle="BW Theme") + 
   theme(plot.title = element_text(size = 20,hjust = 0.5),  
        axis.title = element_text(size = 15),
        axis.text = element_text(size = 10))
gg + theme_classic() + labs(subtitle="Classic Theme") +
   theme(plot.title = element_text(size = 20,hjust = 0.5),  
        axis.title = element_text(size = 15),
        axis.text = element_text(size = 10))

要获得更多定制化的主题，请查看ggthemes软件包和ggthemr软件包

这就是基本知识。我们现在有能力应对更高级的定制。在ggplot教程的第二部分中，我讨论了有关修改主题组件，操纵图例，注释，分面和自定义布局等。