ggplot2_Chapter7_Positioning
7.1 Introduction
Facetting is a mechanism for automatically laying out multiple plots on a page. It splits the data into subsets, and then plots each subset in a different panel. Such plots are often called small multiples or trellis graphics (Sect. 7.2).
Coordinate systems control how the two independent position scales are combined to create a 2d coordinate system. The most common coordinate system is Cartesian, but other coordinate systems can be useful in special circumstances (Sect. 7.3).
7.2 Facetting
library(tidyverse)
mpg2 <- mpg %>%
dplyr::filter(cyl != 5,drv %in% c("4", "f"),class != "2seater")
mpg %>% head()# A tibble: 6 x 11
manufacturer model displ year cyl trans drv cty hwy fl class
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa~
2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa~
3 audi a4 2 2008 4 manual(m6) f 20 31 p compa~
4 audi a4 2 2008 4 auto(av) f 21 30 p compa~
5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa~
6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa~
7.2.1 Facet Wrap
7.2.2 Facet Grid
4 6 8
81 75 49
7.2.3 Controlling Scales
对于facet wrap()和facet grid()来说,您都可以使用scales参数控制所有面板中的位置比例是否相同(fixed)或允许面板之间变化(free):
scales = “fixed”: x and y scales are fixed across all panels.
scales = “free x”: the x scale is free, and the y scale is fixed.
scales = “free y”: the y scale is free, and the x scale is fixed.
scales = “free”: x and y scales vary across panels.
p <- ggplot(mpg2, aes(cty, hwy)) +
geom_abline() +
geom_jitter(width = 0.5, height = 0.5)
p + facet_wrap(~cyl) 当我们想显示以不同比例尺测量的多个时间序列时,free scales也很有用。为此,我们首先需要将“宽”数据更改为“长”数据,将单独的变量堆叠到单个列中。 下面以长篇幅的经济学数据显示了一个示例,该主题在Sect9.3中进行了更详细的讨论。
# A tibble: 6 x 4
date variable value value01
<date> <chr> <dbl> <dbl>
1 1967-07-01 pce 507. 0
2 1967-08-01 pce 510. 0.000265
3 1967-09-01 pce 516. 0.000762
4 1967-10-01 pce 512. 0.000471
5 1967-11-01 pce 517. 0.000916
6 1967-12-01 pce 525. 0.00157
ggplot(economics_long, aes(date, value)) +
geom_line() +
facet_wrap(~variable,scales = "free_y",ncol = 1) # scales = "free_y"facet_grid()有一个名为space的附加参数,该参数采用与比例相同的值。当space为“free”时,每列(或行)的宽度(或高度)将与该列(或行)的比例范围成比例。这使比例在整个图上相等:每个面板上的1 cm映射到相同的数据范围。(这有点类似于晶格的“切片”轴限制。)例如,如果面板a的范围为2,面板b的范围为4,则将三分之一的空间分配给a,将三分之二的空间分配给b 。这对于分类比例尺最为有用,在分类比例尺中,我们可以根据每个构面中的层数按比例分配空间,如下所示。
[1] a4 a4 a4
[4] a4 a4 a4
[7] a4 a4 quattro a4 quattro
[10] a4 quattro a4 quattro a4 quattro
[13] a4 quattro a4 quattro a4 quattro
[16] a6 quattro a6 quattro a6 quattro
[19] k1500 tahoe 4wd k1500 tahoe 4wd k1500 tahoe 4wd
[22] k1500 tahoe 4wd malibu malibu
[25] malibu malibu malibu
[28] caravan 2wd caravan 2wd caravan 2wd
[31] caravan 2wd caravan 2wd caravan 2wd
[34] caravan 2wd caravan 2wd caravan 2wd
[37] caravan 2wd caravan 2wd dakota pickup 4wd
[40] dakota pickup 4wd dakota pickup 4wd dakota pickup 4wd
[43] dakota pickup 4wd dakota pickup 4wd dakota pickup 4wd
[46] dakota pickup 4wd dakota pickup 4wd durango 4wd
[49] durango 4wd durango 4wd durango 4wd
[52] durango 4wd durango 4wd durango 4wd
[55] ram 1500 pickup 4wd ram 1500 pickup 4wd ram 1500 pickup 4wd
[58] ram 1500 pickup 4wd ram 1500 pickup 4wd ram 1500 pickup 4wd
[61] ram 1500 pickup 4wd ram 1500 pickup 4wd ram 1500 pickup 4wd
[64] ram 1500 pickup 4wd explorer 4wd explorer 4wd
[67] explorer 4wd explorer 4wd explorer 4wd
[70] explorer 4wd f150 pickup 4wd f150 pickup 4wd
[73] f150 pickup 4wd f150 pickup 4wd f150 pickup 4wd
[ reached getOption("max.print") -- omitted 130 entries ]
attr(,"scores")
4runner 4wd a4 a4 quattro
15.16667 18.85714 17.12500
a6 quattro altima camry
16.00000 20.66667 19.85714
camry solara caravan 2wd civic
19.85714 15.81818 24.44444
corolla dakota pickup 4wd durango 4wd
25.60000 12.77778 11.85714
explorer 4wd f150 pickup 4wd forester awd
13.66667 13.00000 18.83333
grand cherokee 4wd grand prix gti
13.50000 17.00000 20.00000
impreza awd jetta k1500 tahoe 4wd
19.62500 21.28571 12.50000
land cruiser wagon 4wd malibu maxima
12.00000 18.80000 18.66667
mountaineer 4wd new beetle passat
13.25000 26.00000 18.57143
pathfinder 4wd ram 1500 pickup 4wd range rover
13.75000 11.40000 11.50000
sonata tiburon toyota tacoma 4wd
19.00000 18.28571 15.57143
33 Levels: ram 1500 pickup 4wd range rover ... new beetle
mpg2$model <- reorder(mpg2$model, mpg2$cty)
mpg2$manufacturer <- reorder(mpg2$manufacturer, -mpg2$cty)
ggplot(mpg2, aes(cty, model)) +
geom_point() +
facet_grid(manufacturer~.,
scales = "free",
space = "free") +
theme(strip.text.y = element_text(angle = 0))7.2.4 Missing Facetting Variables
如果您在具有多个数据集的图上使用分面,那么当其中一个数据集缺少分面变量时会发生什么?当您添加在所有面板中都应该相同的上下文信息时,通常会出现这种情况。例如,假设您在空间上展示了性别引起的疾病。添加不包含性别变量的图层时会发生什么?ggplot在这里可以实现您的期望:它将在每个构面中显示map:缺少的构面变量将被视为具有所有值。
df1 <- data.frame(x = 1:3, y = 1:3, gender = c("f", "f", "m"))
df2 <- data.frame(x = 2, y = 2)
ggplot(df1, aes(x, y)) +
geom_point(data = df2, colour = "red", size = 5) +
geom_point() +
facet_wrap(~gender)当您添加注释以使其更易于在facets之间进行比较时,此技术特别有用,如下一节所示.
7.2.5 Grouping vs. Facetting
Facetting是使用美学(例如颜色,形状或大小)来区分群体的替代方法。两种技术都有优点和缺点,这取决于子集的相对位置。通过分面,每个组在其自己的面板中相距很远,并且组之间没有重叠。如果组重叠很多,这很好,但确实会使细微的差异变得很难看。当使用美学来区分groups时,这些groups彼此靠近并且可能重叠,但是细微的差异更容易看到。
df <- data.frame(
x = rnorm(120, c(0, 2, 4)),
y = rnorm(120, c(1, 2, 1)),
z = letters[1:3]
)
ggplot(df, aes(x, y)) +
geom_point(aes(colour = z))Faces之间的比较通常受益于一些周到的注释。例如,在这种情况下,我们可以在每个面板中显示每个组的平均值。您将学习如何在Chap10中编写类似的摘要代码。请注意,我们需要两个“ z”变量:一个用于构面,一个用于颜色
# A tibble: 3 x 3
z2 x y
<fct> <dbl> <dbl>
1 a 0.212 0.832
2 b 2.12 1.69
3 c 3.82 1.02
ggplot(df, aes(x, y)) +
geom_point() +
geom_point(data = df_sum, aes(colour = z2), size = 6) +
facet_wrap(~z) +
facet_wrap(~z2)ggplot(df, aes(x, y)) +
geom_point() +
geom_point(data = df_sum, aes(colour = z2), size = 6) +
facet_wrap(~z2)Another useful technique is to put all the data in the background of each panel:
df2 <- dplyr::select(df, -z)
ggplot(df, aes(x, y)) +
geom_point(data = df2, colour = "grey70") + # 不含z
geom_point(aes(colour = z)) +
facet_wrap(~z)7.2.6 Continuous Variables
- Divide the data into n bins each of the same length:
cut_interval(x, n). - Divide the data into bins of width width:
cut_width(x, width). - Divide the data into n bins each containing (approximately) the same number of points:
cut_number(x, n = 10).
They are illustrated below:
# Bins of width 1
mpg2$disp_w <- cut_width(mpg2$displ, 1)
# Six bins of equal length
mpg2$disp_i <- cut_interval(mpg2$displ, 6)
# Six bins containing equal numbers of points
mpg2$disp_n <- cut_number(mpg2$displ, 6)
plot <- ggplot(mpg2, aes(cty, hwy)) +
geom_point() +
labs(x = NULL, y = NULL)
plot + facet_wrap(~disp_w, nrow = 1)注意: facet公式不评估函数,因此必须首先创建一个包含离散数据的新变量
7.2.7 Exercises
- Diamonds: display the distribution of price conditional on cut and carat. Try facetting by cut and grouping by carat. Try facetting by carat and grouping by cut. Which do you prefer?
# A tibble: 6 x 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63
5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
- Diamonds: compare the relationship between price and carat for each colour. What makes it hard to compare the groups? Is grouping better or facetting? If you use facetting, what annotation might you add to make it easier to see the differences between panels?
diamonds2 <- diamonds %>%
group_by(color) %>%
summarise(carat = mean(carat,na.rm = TRUE),
price = mean(price,na.rm = TRUE)) %>%
rename(color2 = color)
diamonds2# A tibble: 7 x 3
color2 carat price
<ord> <dbl> <dbl>
1 D 0.658 3170.
2 E 0.658 3077.
3 F 0.737 3725.
4 G 0.771 3999.
5 H 0.912 4487.
6 I 1.03 5092.
7 J 1.16 5324.
diamonds %>% ggplot(aes(price,carat)) +
geom_point() +
geom_point(data = diamonds2,aes(col = color2),size = 6) +
facet_wrap(~color2)Why is facet_wrap() generally more useful than facet_grid()?
Recreate the following plot. It facets mpg2 by class, overlaying a smooth curve fit to the full dataset.
mpg2 %>%
ggplot(aes(displ,hwy)) +
geom_point() +
geom_smooth(data = mpg2 %>% select(-class)) + # 不含有分面变量
facet_wrap(~class)7.3 Coordinate Systems
7.4 Linear Coordinate Systems
There are three linear coordinate systems: coord cartesian(), coord flip(), coord fixed().
7.4.1 Zooming into a Plot with coord cartesian()
# Zooming to 5--7 keeps all the data but only shows some of it
base + coord_cartesian(xlim = c(5, 7))7.4.2 Flipping the Axes with coord flip()
大多数统计信息和几何对象假设您对以x值为条件的y值感兴趣(例如,平滑,汇总,箱线图,线):在大多数统计模型中,假定x值的测量没有错误。如果您对以y为条件的x感兴趣(或者只想将绘图旋转90度),则可以使用coord flip()交换x和y轴。将其与仅交换映射到x和y的变量进行比较
coord_flip() fits the smooth to the original data, and then rotates the output
7.4.3 Equal Scales with coord fixed()
coord fixed()固定x和y轴上的长度比率。默认比率可确保x和y轴具有相等的比例尺:即,沿x轴的1厘米表示与沿y轴的1厘米相同的数据范围。还将设置纵横比,以确保无论输出设备的形状如何,都可以保持映射。有关更多详细信息,请参见coord fixed()的文档。
7.5 Non-linear Coordinate Systems
rect <- data.frame(x = 50, y = 50)
line <- data.frame(x = c(1, 200), y = c(100, 1))
base <- ggplot(mapping = aes(x, y)) +
geom_tile(data = rect, aes(width = 50, height = 50)) +
geom_line(data = line) +
xlab(NULL) + ylab(NULL)
base- We start with a line parameterised by its two endpoints:
df <- data.frame(r = c(0, 1), theta = c(0, 3 / 2 * pi))
ggplot(df, aes(r, theta)) +
geom_line(size = 2) +
geom_point(size = 6, colour = "red")- We break it into multiple line segments, each with two endpoints.
interp <- function(rng, n) {
seq(rng[1], rng[2], length = n)
}
munched <- data.frame(
r = interp(df$r, 15),
theta = interp(df$theta, 15)
)
ggplot(munched, aes(r, theta)) +
geom_line() +
geom_point(size = 2, colour = "red")- We transform the locations of each piece:
transformed <- transform(munched,
x = r * sin(theta),
y = r * cos(theta)
)
ggplot(transformed, aes(x, y)) +
geom_path() +
geom_point(size = 2, colour = "red") +
coord_fixed()