ggplot2_Chapter7_Positioning

7.1 Introduction
7.2 Facetting
7.3 Coordinate Systems
7.4 Linear Coordinate Systems
7.5 Non-linear Coordinate Systems

7.1 Introduction

Facetting is a mechanism for automatically laying out multiple plots on a page. It splits the data into subsets, and then plots each subset in a different panel. Such plots are often called small multiples or trellis graphics (Sect. 7.2).
Coordinate systems control how the two independent position scales are combined to create a 2d coordinate system. The most common coordinate system is Cartesian, but other coordinate systems can be useful in special circumstances (Sect. 7.3).

7.2 Facetting

library(tidyverse)

mpg2 <- mpg %>% 
  dplyr::filter(cyl != 5,drv %in% c("4", "f"),class != "2seater")

mpg %>% head()

# A tibble: 6 x 11
  manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class 
  <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr> 
1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compa~
2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compa~
3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compa~
4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compa~
5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compa~
6 audi         a4      2.8  1999     6 manual(m5) f        18    26 p     compa~

7.2.1 Facet Wrap

base <- ggplot(mpg2, aes(displ, hwy)) +
  geom_point()

base

base + facet_wrap(~class,ncol = 3)   # ncol = 3

base + facet_wrap(~class, ncol = 3, as.table = FALSE)

base + facet_wrap(~class, nrow = 3)

base + facet_wrap(~class, nrow = 3, dir = "v")

7.2.2 Facet Grid

table(mpg2$cyl)


 4  6  8 
81 75 49

base + facet_grid(.~cyl)

base + facet_grid(cyl~.)

base + facet_grid(drv~cyl)

7.2.3 Controlling Scales

对于facet wrap（）和facet grid（）来说，您都可以使用scales参数控制所有面板中的位置比例是否相同（fixed）或允许面板之间变化（free）：

scales = “fixed”: x and y scales are fixed across all panels.
scales = “free x”: the x scale is free, and the y scale is fixed.
scales = “free y”: the y scale is free, and the x scale is fixed.
scales = “free”: x and y scales vary across panels.

p <- ggplot(mpg2, aes(cty, hwy)) +
    geom_abline() +
    geom_jitter(width = 0.5, height = 0.5)

p + facet_wrap(~cyl)

p + facet_wrap(~cyl, scales = "free")

当我们想显示以不同比例尺测量的多个时间序列时，free scales也很有用。为此，我们首先需要将“宽”数据更改为“长”数据，将单独的变量堆叠到单个列中。下面以长篇幅的经济学数据显示了一个示例，该主题在Sect9.3中进行了更详细的讨论。

economics_long %>% head()

# A tibble: 6 x 4
  date       variable value  value01
  <date>     <chr>    <dbl>    <dbl>
1 1967-07-01 pce       507. 0       
2 1967-08-01 pce       510. 0.000265
3 1967-09-01 pce       516. 0.000762
4 1967-10-01 pce       512. 0.000471
5 1967-11-01 pce       517. 0.000916
6 1967-12-01 pce       525. 0.00157

ggplot(economics_long, aes(date, value)) +
  geom_line() +
  facet_wrap(~variable,scales = "free_y",ncol = 1) # scales = "free_y"

facet_grid（）有一个名为space的附加参数，该参数采用与比例相同的值。当space为“free”时，每列（或行）的宽度（或高度）将与该列（或行）的比例范围成比例。这使比例在整个图上相等：每个面板上的1 cm映射到相同的数据范围。（这有点类似于晶格的“切片”轴限制。）例如，如果面板a的范围为2，面板b的范围为4，则将三分之一的空间分配给a，将三分之二的空间分配给b 。这对于分类比例尺最为有用，在分类比例尺中，我们可以根据每个构面中的层数按比例分配空间，如下所示。

reorder(mpg2$model, mpg2$cty)

 [1] a4                  a4                  a4                 
 [4] a4                  a4                  a4                 
 [7] a4                  a4 quattro          a4 quattro         
[10] a4 quattro          a4 quattro          a4 quattro         
[13] a4 quattro          a4 quattro          a4 quattro         
[16] a6 quattro          a6 quattro          a6 quattro         
[19] k1500 tahoe 4wd     k1500 tahoe 4wd     k1500 tahoe 4wd    
[22] k1500 tahoe 4wd     malibu              malibu             
[25] malibu              malibu              malibu             
[28] caravan 2wd         caravan 2wd         caravan 2wd        
[31] caravan 2wd         caravan 2wd         caravan 2wd        
[34] caravan 2wd         caravan 2wd         caravan 2wd        
[37] caravan 2wd         caravan 2wd         dakota pickup 4wd  
[40] dakota pickup 4wd   dakota pickup 4wd   dakota pickup 4wd  
[43] dakota pickup 4wd   dakota pickup 4wd   dakota pickup 4wd  
[46] dakota pickup 4wd   dakota pickup 4wd   durango 4wd        
[49] durango 4wd         durango 4wd         durango 4wd        
[52] durango 4wd         durango 4wd         durango 4wd        
[55] ram 1500 pickup 4wd ram 1500 pickup 4wd ram 1500 pickup 4wd
[58] ram 1500 pickup 4wd ram 1500 pickup 4wd ram 1500 pickup 4wd
[61] ram 1500 pickup 4wd ram 1500 pickup 4wd ram 1500 pickup 4wd
[64] ram 1500 pickup 4wd explorer 4wd        explorer 4wd       
[67] explorer 4wd        explorer 4wd        explorer 4wd       
[70] explorer 4wd        f150 pickup 4wd     f150 pickup 4wd    
[73] f150 pickup 4wd     f150 pickup 4wd     f150 pickup 4wd    
 [ reached getOption("max.print") -- omitted 130 entries ]
attr(,"scores")
           4runner 4wd                     a4             a4 quattro 
              15.16667               18.85714               17.12500 
            a6 quattro                 altima                  camry 
              16.00000               20.66667               19.85714 
          camry solara            caravan 2wd                  civic 
              19.85714               15.81818               24.44444 
               corolla      dakota pickup 4wd            durango 4wd 
              25.60000               12.77778               11.85714 
          explorer 4wd        f150 pickup 4wd           forester awd 
              13.66667               13.00000               18.83333 
    grand cherokee 4wd             grand prix                    gti 
              13.50000               17.00000               20.00000 
           impreza awd                  jetta        k1500 tahoe 4wd 
              19.62500               21.28571               12.50000 
land cruiser wagon 4wd                 malibu                 maxima 
              12.00000               18.80000               18.66667 
       mountaineer 4wd             new beetle                 passat 
              13.25000               26.00000               18.57143 
        pathfinder 4wd    ram 1500 pickup 4wd            range rover 
              13.75000               11.40000               11.50000 
                sonata                tiburon      toyota tacoma 4wd 
              19.00000               18.28571               15.57143 
33 Levels: ram 1500 pickup 4wd range rover ... new beetle

mpg2$model <- reorder(mpg2$model, mpg2$cty)

mpg2$manufacturer <- reorder(mpg2$manufacturer, -mpg2$cty)

ggplot(mpg2, aes(cty, model)) +
  geom_point() +
  facet_grid(manufacturer~., 
             scales = "free",
             space = "free") +
  theme(strip.text.y = element_text(angle = 0))

7.2.4 Missing Facetting Variables

如果您在具有多个数据集的图上使用分面，那么当其中一个数据集缺少分面变量时会发生什么？当您添加在所有面板中都应该相同的上下文信息时，通常会出现这种情况。例如，假设您在空间上展示了性别引起的疾病。添加不包含性别变量的图层时会发生什么？ggplot在这里可以实现您的期望：它将在每个构面中显示map：缺少的构面变量将被视为具有所有值。

df1 <- data.frame(x = 1:3, y = 1:3, gender = c("f", "f", "m"))
df2 <- data.frame(x = 2, y = 2)

ggplot(df1, aes(x, y)) +
  geom_point(data = df2, colour = "red", size = 5) +
  geom_point() +
  facet_wrap(~gender)

当您添加注释以使其更易于在facets之间进行比较时，此技术特别有用，如下一节所示.

7.2.5 Grouping vs. Facetting

Facetting是使用美学（例如颜色，形状或大小）来区分群体的替代方法。两种技术都有优点和缺点，这取决于子集的相对位置。通过分面，每个组在其自己的面板中相距很远，并且组之间没有重叠。如果组重叠很多，这很好，但确实会使细微的差异变得很难看。当使用美学来区分groups时，这些groups彼此靠近并且可能重叠，但是细微的差异更容易看到。

df <- data.frame(
    x = rnorm(120, c(0, 2, 4)),
    y = rnorm(120, c(1, 2, 1)),
    z = letters[1:3]
)

ggplot(df, aes(x, y)) +
  geom_point(aes(colour = z))

df %>% 
  ggplot(aes(x,y)) +
  geom_point() +
  facet_wrap(~z)

Faces之间的比较通常受益于一些周到的注释。例如，在这种情况下，我们可以在每个面板中显示每个组的平均值。您将学习如何在Chap10中编写类似的摘要代码。请注意，我们需要两个“ z”变量：一个用于构面，一个用于颜色

df_sum <- df %>%
  group_by(z) %>%
  summarise(x = mean(x), y = mean(y)) %>%
  rename(z2 = z)

df_sum

# A tibble: 3 x 3
  z2        x     y
  <fct> <dbl> <dbl>
1 a     0.212 0.832
2 b     2.12  1.69 
3 c     3.82  1.02

ggplot(df, aes(x, y)) +
  geom_point() +
  geom_point(data = df_sum, aes(colour = z2), size = 6) +
  facet_wrap(~z) +
  facet_wrap(~z2)

ggplot(df, aes(x, y)) +
  geom_point() +
  geom_point(data = df_sum, aes(colour = z2), size = 6) +
  facet_wrap(~z2)

Another useful technique is to put all the data in the background of each panel:

df2 <- dplyr::select(df, -z)

ggplot(df, aes(x, y)) +
  geom_point(data = df2, colour = "grey70") +   # 不含z
  geom_point(aes(colour = z)) +
  facet_wrap(~z)

7.2.6 Continuous Variables

Divide the data into n bins each of the same length: cut_interval(x, n).
Divide the data into bins of width width: cut_width(x, width).
Divide the data into n bins each containing (approximately) the same number of points: cut_number(x, n = 10).

They are illustrated below:

# Bins of width 1
mpg2$disp_w <- cut_width(mpg2$displ, 1)

# Six bins of equal length
mpg2$disp_i <- cut_interval(mpg2$displ, 6)

# Six bins containing equal numbers of points
mpg2$disp_n <- cut_number(mpg2$displ, 6)

plot <- ggplot(mpg2, aes(cty, hwy)) +
  geom_point() +
  labs(x = NULL, y = NULL)

plot + facet_wrap(~disp_w, nrow = 1)

plot + facet_wrap(~disp_i, nrow = 1)

plot + facet_wrap(~disp_n, nrow = 1)

注意: facet公式不评估函数，因此必须首先创建一个包含离散数据的新变量

7.2.7 Exercises

Diamonds: display the distribution of price conditional on cut and carat. Try facetting by cut and grouping by carat. Try facetting by carat and grouping by cut. Which do you prefer?

diamonds %>% head()

# A tibble: 6 x 10
  carat cut       color clarity depth table price     x     y     z
  <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48

diamonds %>% 
  ggplot(aes(price)) +
  geom_freqpoly(aes(col = cut))

diamonds %>% 
  ggplot(aes(price)) +
  geom_freqpoly() +
  facet_grid(~cut)

Diamonds: compare the relationship between price and carat for each colour. What makes it hard to compare the groups? Is grouping better or facetting? If you use facetting, what annotation might you add to make it easier to see the differences between panels?

diamonds %>% ggplot(aes(price,carat)) +
  geom_point(aes(col = color))

diamonds2 <- diamonds %>% 
  group_by(color) %>% 
  summarise(carat = mean(carat,na.rm = TRUE),
            price = mean(price,na.rm = TRUE)) %>% 
  rename(color2 = color)

diamonds2

# A tibble: 7 x 3
  color2 carat price
  <ord>  <dbl> <dbl>
1 D      0.658 3170.
2 E      0.658 3077.
3 F      0.737 3725.
4 G      0.771 3999.
5 H      0.912 4487.
6 I      1.03  5092.
7 J      1.16  5324.

diamonds %>% ggplot(aes(price,carat)) + 
  geom_point() +
  geom_point(data = diamonds2,aes(col = color2),size = 6) +
  facet_wrap(~color2)

Why is facet_wrap() generally more useful than facet_grid()?
Recreate the following plot. It facets mpg2 by class, overlaying a smooth curve fit to the full dataset.

mpg2 %>% 
  ggplot(aes(displ,hwy)) +
  geom_point() +
  geom_smooth() +
  facet_wrap(~class,nrow = 2)

mpg2 %>% 
  ggplot(aes(displ,hwy)) + 
  geom_point() +
  geom_smooth(data = mpg2 %>% select(-class)) +   # 不含有分面变量
  facet_wrap(~class)

7.3 Coordinate Systems

7.4 Linear Coordinate Systems

There are three linear coordinate systems: coord cartesian(), coord flip(), coord fixed().

7.4.1 Zooming into a Plot with coord cartesian()

base <- ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  geom_smooth()

# Full dataset
base

# Scaling to 5--7 throws away data outside that range
base + scale_x_continuous(limits = c(5, 7))

# Zooming to 5--7 keeps all the data but only shows some of it
base + coord_cartesian(xlim = c(5, 7))

7.4.2 Flipping the Axes with coord flip()

大多数统计信息和几何对象假设您对以x值为条件的y值感兴趣（例如，平滑，汇总，箱线图，线）：在大多数统计模型中，假定x值的测量没有错误。如果您对以y为条件的x感兴趣（或者只想将绘图旋转90度），则可以使用coord flip（）交换x和y轴。将其与仅交换映射到x和y的变量进行比较

ggplot(mpg, aes(displ, cty)) +
  geom_point() +
  geom_smooth()

ggplot(mpg, aes(cty, displ)) +
  geom_point() +
  geom_smooth()

coord_flip() fits the smooth to the original data, and then rotates the output

ggplot(mpg, aes(displ, cty)) +
  geom_point() +
  geom_smooth() +
  coord_flip()

7.4.3 Equal Scales with coord fixed()

coord fixed（）固定x和y轴上的长度比率。默认比率可确保x和y轴具有相等的比例尺：即，沿x轴的1厘米表示与沿y轴的1厘米相同的数据范围。还将设置纵横比，以确保无论输出设备的形状如何，都可以保持映射。有关更多详细信息，请参见coord fixed（）的文档。

7.5 Non-linear Coordinate Systems

rect <- data.frame(x = 50, y = 50)

line <- data.frame(x = c(1, 200), y = c(100, 1))

base <- ggplot(mapping = aes(x, y)) +
  geom_tile(data = rect, aes(width = 50, height = 50)) +
  geom_line(data = line) +
  xlab(NULL) + ylab(NULL)

base

base + coord_polar("x")

base + coord_polar("y")

base + coord_flip()

base + coord_trans(y = "log10")

base + coord_fixed()

We start with a line parameterised by its two endpoints:

df <- data.frame(r = c(0, 1), theta = c(0, 3 / 2 * pi))
ggplot(df, aes(r, theta)) +
  geom_line(size = 2) +
  geom_point(size = 6, colour = "red")

We break it into multiple line segments, each with two endpoints.

interp <- function(rng, n) {
seq(rng[1], rng[2], length = n)
}

munched <- data.frame(
r = interp(df$r, 15),
theta = interp(df$theta, 15)
)

ggplot(munched, aes(r, theta)) +
  geom_line() +
  geom_point(size = 2, colour = "red")

We transform the locations of each piece:

transformed <- transform(munched,
x = r * sin(theta),
y = r * cos(theta)
)

ggplot(transformed, aes(x, y)) +
  geom_path() +
  geom_point(size = 2, colour = "red") +
  coord_fixed()