3.3.1 Exercises
1. What’s gone wrong with this code? Why are the points not blue?
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))

aes
関数は引数をデータとして解釈するので、上記のコードは以下と同じような意味になる。
mpg %>% mutate(color = "blue") %>%
ggplot() + geom_point(aes(x = displ, y = hwy, color = color))

修正版
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "blue")

3. Map a continuous variable to color
, size
, and shape
. How do these aesthetics behave differently for categorical vs. continuous variables?
shapeは連続だとダメ。
## myplot <- function(aes_name, var_name) {
## aes_name <- sym(aes_name)
## var_name <- sym(var_name)
## p <- ggplot(mpg, aes(displ, hwy)) +
## geom_point(mapping = aes(!!aes_name := !!var_name))
## print(p)
## }
myplot <- function(aes_name, var_name) {
aes_name <- sym(aes_name)
var_name <- sym(var_name)
ggplot(mpg) +
geom_point(mapping = aes(displ, hwy, !!aes_name := !!var_name))
}
var_names <- mpg %>% select_if(is.numeric) %>% names
aes_names <- c("color", "size", "shape")
plots <- crossing(aes_name = aes_names, var_name = var_names) %>% pmap(myplot)
plots %>% walk(safely(print))
4. What happens if you map the same variable to multiple aesthetics?
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class, shape = class, size = class))
Warning: Using size for a discrete variable is not advised.
Warning: The shape palette can deal with a maximum of 6 discrete values
because more than 6 becomes difficult to discriminate; you have 7.
Consider specifying shapes manually if you must have them.
Warning: Removed 62 rows containing missing values (geom_point).

5. What does the stroke
aesthetic do? What shapes does it work with? (Hint: use ?geom_point
)
# For shapes that have a border (like 21), you can colour the inside and
# outside separately. Use the stroke aesthetic to modify the width of the
# border
ggplot(mtcars, aes(wt, mpg)) +
geom_point(shape = 21, colour = "black", fill = "white", size = 5, stroke = 5)

6. What happens if you map an aethetic to something other than a variable name, like aes(colour = displ < 5)
? Note, you’ll also need to specify x and y.
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = displ < 5))

3.5.1 Exercises
1. What happens if you facet on a continuous variable?
たくさんでてくる。
ggplot(mpg, aes(hwy, cty)) +
geom_point() +
facet_wrap(~displ)

2. What do the empty cells in plot with facet_grid(drv ~ cyl)
mean? How do they relate to this plot?
そのようなcylとdrvの組み合わせを持つ車が存在しないことを意味する。
expand(mpg, cyl, drv) %>% left_join(count(mpg, cyl, drv))
3. What plots does the following code make? What does . do?
For compatibility with the classic interface, ‘rows’ can also be a formula with the rows (of the tabular display) on the LHS and the columns (of the tabular display) on the RHS; the dot in the formula is used to indicate there should be no faceting on this dimension (either row or column).
facet_grid
はformulaの左辺をrow、右辺をcolumnとして解釈する。.
は空を意味する。
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ .)
## 上と同じ
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(rows = vars(drv))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(. ~ cyl)
## 上と同じ
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(cols = vars(cyl))
4. Take the first faceted plot in this section:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
What are the advantages to using faceting instead of the colour aesthetic? What are the disadvantages? How might the balance change if you had a larger dataset?
classがもっと多くなると、色での見分けは困難になる。少ないうちは色のほうが一目で理解しやすい。
5. Read ?facet_wrap. What does nrow do? What does ncol do? What other options control the layout of the individual panels? Why doesn’t facet_grid() have nrow and ncol arguments?
nrowとncolで表示する際の行数、列数を指定している。facet_grid
の場合、それらはデータによって決まるので指定する意味がない。
6. When using facet_grid() you should usually put the variable with more unique levels in the columns. Why?
問題の意味がよくわからなかった。 colsの値が一つしかなかったら意味がないということ?
mpg %>% mutate(x = 1) %>%
ggplot() +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(cols = vars(x))

3.6.1 Exercises
1. What geom would you use to draw a line chart? A boxplot? A histogram? An area chart?
geom_line
,geom_path
,geom_boxplot
,geom_histogram
,geom_area
2. Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(se = FALSE)
`geom_smooth()` using method = 'loess' and formula 'y ~ x'

3. What does show.legend = FALSE
do? What happens if you remove it? Why do you think I used it earlier in the chapter?
他のプロットと表示を揃えるため。
ggplot(data = mpg) +
geom_smooth(
mapping = aes(x = displ, y = hwy, color = drv),
show.legend = FALSE
)
`geom_smooth()` using method = 'loess' and formula 'y ~ x'

4. What does the se argument to geom_smooth() do?
信頼区間を表示するかどうか。
5. Will these two graphs look different? Why/why not?
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
ggplot() +
geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))
同じ。ggplotでデフォルトのmappingを指定しているので。
6. Recreate the R code necessary to generate the following graphs.
p_base <- ggplot(mpg, aes(displ, hwy))
p_point <- p_base + geom_point()
p1 <- p_point + geom_smooth(se = FALSE)
p2 <- p_point + geom_smooth(aes(group = drv), se = FALSE)
p_color <- p_base + geom_point(aes(color = drv))
p3 <- p_color + geom_smooth(aes(color = drv), se = FALSE)
p4 <- p_color + geom_smooth(se = FALSE)
p5 <- p_color + geom_smooth(aes(linetype = drv), se = FALSE)
p6 <- p_base + geom_point(color = "white", size = 4) + geom_point(aes(color = drv))
p1 + p2 + p3 + p4 + p5 + p6 + plot_layout(ncol = 2)

3.7.1 Exercises
1. What is the default geom associated with stat_summary()
? How could you rewrite the previous plot to use that geom function instead of the stat function?
diamonds %>% group_by(cut) %>%
summarise(ymin = min(depth),
ymax = max(depth),
y = median(depth)) %>%
ggplot() +
geom_pointrange(aes(x = cut, y = y, ymin = ymin, ymax = ymax))

2. What does geom_col()
do? How is it different to geom_bar()
?
geom_bar
はstat_count
を使うが、geom_col
はidentity
を使っている。 カウント部分を自前で処理したデータを表示したいときに便利。
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
diamonds %>% count(cut) %>%
ggplot() +
geom_col(aes(x = cut, y = n))
diamonds %>% count(cut) %>%
ggplot() +
geom_bar(aes(x = cut, y = n), stat = "identity")
3. Most geoms and stats come in pairs that are almost always used in concert. Read through the documentation and make a list of all the pairs. What do they have in common?
面倒なのでパス
4. What variables does stat_smooth()
compute? What parameters control its behaviour?
y predicted value ymin lower pointwise confidence interval around the mean ymax upper pointwise confidence interval around the mean se standard error
Smoothing methodを変更すると変わる。
5. In our proportion bar chart, we need to set group = 1. Why? In other words what is the problem with these two graphs?
1
prop
はgroupごとに計算される。 デフォルトだとxがグループになるので(この場合cut)、全てを同じグループにするために1を渡す必要がある
p1 <- ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = ..prop..))
ggplot_build(p1)$data[[1]] %>% select(count, prop, x, group)
p2 <- ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))
ggplot_build(p2)$data[[1]] %>% select(count, prop, x, group)
p1 + p2

2
groupをcutにすると上手くいかないのでcountした。
diamonds %>% count(cut, color) %>% mutate(prop = n/sum(n)) %>%
ggplot() +
geom_col(mapping = aes(x = cut, y = prop, fill = color))

3.8.1 Exercises
1. What is the problem with this plot? How could you improve it?
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point()

重なりが多いのでjitterしてみる。
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_jitter()

2. What parameters to geom_jitter() control the amount of jittering?
width
とheight
3. Compare and contrast geom_jitter()
with geom_count()
.
上でjitterしてみたものをcountしてみる。
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_count()

重なっている点の数にばらつきがあるならcountの方がいい。 重なっている点の数がだいたい同じようならjitterの方がいい。
4. What’s the default position adjustment for geom_boxplot()? Create a visualisation of the mpg dataset that demonstrates it.
p <- ggplot(mpg, aes(class, hwy, color = drv))
p1 <- p + geom_boxplot() + labs(title = "dodge2")
p2 <- p + geom_boxplot(position = "identity") + labs(title = "identity")
p1 + p2

3.9.1 Exercises
1. Turn a stacked bar chart into a pie chart using coord_polar()
.
p1 <- diamonds %>%
ggplot() +
geom_bar(mapping = aes(x = cut, fill = color), position = "fill")
p2 <- p1 + coord_polar(theta = "y")
p1 + p2

2. What does labs()
do? Read the documentation.
タイトルやキャプションをつける。
3. What’s the difference between coord_quickmap()
and coord_map()
?
緯度経度による歪みの補正があったりなかったり。
nz <- map_data("nz")
# Prepare a map of NZ
nzmap <- ggplot(nz, aes(x = long, y = lat, group = group)) +
geom_polygon(fill = "white", colour = "black")
# Plot it in cartesian coordinates
(nzmap + labs(title = "normal")) +
(nzmap + coord_map() + labs(title = "map")) +
(nzmap + coord_quickmap() + labs(title = "quickmap"))

4. What does the plot below tell you about the relationship between city and highway mpg? Why is coord_fixed()
important? What does geom_abline()
do?
cty
とhwy
が傾き1の直線にのるような関係であることを示すために、 プロット画面の縦横比が変わっても描画領域の縦横比が変わらないようにしている。
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point() +
geom_abline() +
coord_fixed()

