In the EDA,you need learn how to use plots as tools for exploration

library(tidyverse)
## ─ Attaching packages ──── tidyverse 1.2.1 ─
## ✔ ggplot2 3.1.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.1     ✔ dplyr   0.8.1
## ✔ tidyr   0.8.3     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ─ Conflicts ───── tidyverse_conflicts() ─
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(ggrepel)

Label

The easiest place to start when turning an exploratory graphic into an expository graphaic is with good labels
add labels with the labs() function

ggplot(mpg,aes(displ,hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth(se = FALSE) +
  labs(title = "Fuel efficiency generally decreaes with engine size")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

you can add more text
> subtitle
> caption

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth(se = FALSE) +
  labs(
    title = "Fuel efficiency generally decreases with engine size",
    subtitle = "Two seaters (sports cars) are an exception because of their light weight",
    caption = "Data from fueleconomy.gov"
  )
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

you can also use labs() to replace the axis and legend titles

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  geom_smooth(se = FALSE) +
  labs(
    x = "Engine displacement (L)",
    y = "Highway fuel economy (mpg)",
    colour = "Car type"
  )
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

if you want to use the mathematical equations instead of text strings,just use quote()

df <- tibble(
  x = runif(10),
  y = runif(10)
)
ggplot(df, aes(x, y)) +
  geom_point() +
  labs(
    x = quote(sum(x[i] ^ 2, i == 1, n)),
    y = quote(alpha + beta + frac(delta, theta))
  )

Annotations

In addition to labelling major components of your plot,it’s often useful to label individual observations or group of observations
> geom_text()
the function is similar to geom_point(),but it has an additional aesthetic: label

there are two sources of labels
first,you might have a tibble that provides labels. the plot below isn’t terribly useful,but it illustrates a useful approach

best_in_class <- mpg %>%
  group_by(class) %>%
  filter(row_number(desc(hwy)) == 1)

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  geom_text(aes(label = model), data = best_in_class)

This is hard to read,beacuse the labels overlap with each other
we can make things a little better by switching to geom_label() which draws a rectangle behind the text

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  geom_label(aes(label = model), 
             data = best_in_class, 
             nudge_y = 2, # move the slightly above the text  
             alpha = 0.5)

That helps a bit ,but if you look closely in the top-left hand corner,you’ll notice that there are two labels practically on top of each other
you can use ggrepel package,it will automatically adjust labels so that they don’t overlap

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  geom_point(size = 3, 
             shape = 1, 
             data = best_in_class) +
  geom_label_repel(aes(label = model), 
                   data = best_in_class)

Note another handy technique used here: I added a second layer of large,holow points to highlight the points that I’ve labelled

Remember, in addition to geom_text(),you have many other geoms in ggplot2 available to help annotate your plot
>geom_hline
add reference horizonal lines

geom_vline
add reference vertical lines

geom_rect
draw a rectangle around points of interest

geom_segment
draw a segment

complicated task ?
how can you put a different label in each facet

If the facet variable is not specified, the text is drawn in all facets

label <- tibble(
  displ = Inf,
  hwy = Inf,
  label = "Increasing engine size is \nrelated to decreasing fuel economy."
)

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  geom_text(aes(label = label),
    data = label, 
    vjust = "top", 
    hjust = "right",
    size = 2) +
  facet_wrap(~class)

To draw the label in only one facet, add a column to the label data frame with the value of the faceting variable(s) in which to draw it

label <- tibble(
  displ = Inf,
  hwy = Inf,
  class = "2seater",
  label = "Increasing engine size is \nrelated to decreasing fuel economy."
)

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  geom_text(aes(label = label),
    data = label, 
    vjust = "top", 
    hjust = "right",
    size = 2
  ) +
  facet_wrap(~class)

To draw labels in different plots, simply have the facetting variable(s)

label <- tibble(
  displ = Inf,
  hwy = Inf,
  class = unique(mpg$class), # the Label data.frame add the facet variable 
  label = str_c("Label for ", class)
)

ggplot(mpg, aes(displ, hwy)) +
  geom_point() + 
  facet_wrap(~class) +
  geom_text(aes(label = label),
    data = label, 
    vjust = "top", 
    hjust = "right",
    size = 3
  ) 

In the simility,we can also add different vertical line in different facet
只需要新建一个辅助数据框 aux,将对应分面变量的横截距添加进去即可

aux <- data.frame(cyl = c(4,6,8), 
                  l = c(3.5,4,4.5), 
                  m = c(5,4.3,3))

ggplot(mtcars, aes(x = drat)) + 
  geom_line(aes(y = mpg, 
                colour = "mpg")) + 
  geom_line(aes(y = qsec, 
                colour = "qsec"))  +
  facet_wrap(~cyl) + 
  geom_vline(aes(xintercept = l, 
                 colour = "xiaopang"),
             data = aux,
             lty=2) +
  geom_vline(aes(xintercept = m, 
                 colour = "xiaomei"),
             data = aux,
             lty = 2)

scales

The third way you can make your plot better for communication is to adjust the scales

Normally,ggplot2 automatically adds scale for you. For example

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class))

upper codes equal to below codes

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  scale_x_continuous() +
  scale_y_continuous() +
  scale_colour_discrete()

the default scales are named according to the type of variable they align with: continuous, discrete, datetime or date

the default scales have been carefully chosen to do a good job for a wide range of inputs.Nevertheless, you might want to override the defaults for two reasons:
fist,you want to tweak some of the parameters of the default scale.This allows you to do things like change the breaks on the axes,or the key label on the legend
second,you want to replace the scale altogether,and use a completely different algorithm

Axis ticks and legend keys

There are two primary arguments that affect the appearance of the ticks on the axes and the key on the legend: breaks and labels
> breaks
breaks control the position of the ticks, or the values associated with the keys
> labels
labels control the text label associated with each tick/key

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  scale_y_continuous(breaks = seq(15, 40, by = 5))  

you can use labels() in the same way(a character vector the same length as breaks),but you can also set it to NULL to suppress the labels altogether

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  scale_x_continuous(labels = NULL) +
  scale_y_continuous(labels = NULL)

axes and legends are called guides. Axes are used for x and y aesthetics; legends are used for everything else

legend layout

you will most often use breaks and labels to tweak the axes.
To control the overall position of the legend,you need to use a theme() setting,theme() control the non-data parts of the plot

the theme setting legend.position controls where the legend is drawn

base <- ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class))

base + theme(legend.position = "left")

base + theme(legend.position = "right") # the default 

base + theme(legend.position = "none") # suppress the display of the legend altogether

To control the display of dividual legends,use guides() along with guide_legend() or guide_colourbar

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  geom_smooth(se = FALSE) +
  theme(legend.position = "bottom") +
  guides(colour = guide_legend(nrow = 1,  # controlling the number of rows the legend 
                               override.aes = list(size = 4))) # overriding one of the aesthetics to make the points bigger
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

replacing a scale

Instead of just tweaking the details a little,you can instead replaced the scale altogehther

there are two types of scales you’re mostly likely to want to switch out: continuous position scales and colour scales

it’s very useful to plot transformations of your variable

ggplot(diamonds, aes(carat, price)) +
  geom_bin2d()

ggplot(diamonds, 
       aes(log10(carat), log10(price))) +
  geom_bin2d()

however,the disadvantage of this transformation is that the axes are now labelled with transformed values,make it hard to interpret the plot

Instead of doing the transformation in the aesthetic mapping,we can instead do it with the scale

ggplot(diamonds, aes(carat, price)) +
  geom_bin2d() + 
  scale_x_log10() + 
  scale_y_log10()

Another scale that is frequently customised is colour. use ColorBrewer scales which have been hand tuned to work better for people with common types of colour blindness

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(color = drv)) +
  scale_colour_brewer(palette = "Set1")

RColorBrewer::display.brewer.all()

above plot shows the complete list of all palettes.the sequential(top) and diveraging(bottom) palettes are particulary useful if your categorical values are ordered, or have a “middle”

when you have a predefined mapping between values and colours, use scale_colour_manual()

presidential %>%
  mutate(id = 33 + row_number()) %>%
  ggplot(aes(start, 
             id, 
             colour = party)) +
    geom_point() +
    geom_segment(aes(xend = end, 
                     yend = id)) +
    scale_colour_manual(
      values = c(Republican = "red",
                 Democratic = "blue"))

For continuous colour,you can use the built-in scale_colour_gradient() or scale_fill_gradient()
if you have a diverging scale,you can use scale_colour_gradient2() .for example,positive and negative values different colours

Zooming

To zoom in on a region of the plot,it’s generally best to use coord_cartesian()

ggplot(mpg,mapping = aes(displ,hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth() +
  coord_cartesian(xlim = c(5,7),
                  ylim = c(10,30))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Themes

you can customise the non-data elements of your plot with a theme

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth(se = FALSE) +
  theme_bw()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'