The introductory workshop material is here: Part 1, Part 2.

Topics

  • The ggplot2 pipeline
  • Plot layers
  • The relationship between stats and geoms
  • Scales
  • Coordinate transformations
  • Expressions
  • Annotations

Goal: clearer, more effective data visualizations. This will involve understanding what constitutes a ggplot layer, the relationship between stats and geoms, and the difference between scale transformations and coordinate transformations.

Assumptions

I assume you’re familiar with the basic mechanics of ggplot2 as covered in the introductory workshop (links above):

  • The fundamentals of R and ggplot2
  • Basic data visualization concepts
  • Grammar of graphics
  • Building a plot: dataset structure, ggplot(), aesthetics, geoms, labels, themes, facets
  • Aesthetic and theme inheritance principles

The ggplot2 pipeline

Today we’re loosely following the ggplot2 visualization pipeline as given in Figure 13.2 from the ggplot2 book. It’s important to understand the most important parts of this sequence, because their order determines functionality.

Plot layers

We think of geoms as layers, and that’s usually true, what actually makes up a plot layer?

  • An optional mapping of aesthetics that supplements or overrides the default mapping set in ggplot()
  • An optional dataset that overrides the default plot dataset
  • A geom: the geometric object used to draw each observation
  • A stat: the statistical transformation to use (even if it’s only the “identity” stat)
  • A position used to adjust overlapping objects

With this we can start to truly understand the usage of geom_point() for example:

geom_point(
  mapping = NULL,
  data = NULL,
  stat = "identity",
  position = "identity",
  ...,
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE
)

Datasets and mappings

We’re not spending much time on these today.

Key point: different plot layers may, but don’t have to, have different data sources and aesthetics that override the plot-level ones defined in ggplot().

EXERCISE: plot layers

How was this plots built? What layers and data sources can you see?

From Our favorite Pew Research Center data visualizations from 2014

Geoms

## mpg

Many of the examples below will use ggplot2’s mpg dataset.

This dataset contains a subset of the fuel economy data that the EPA makes available on http://fueleconomy.gov. It contains only models which had a new release every year between 1999 and 2008 - this was used as a proxy for the popularity of the car.

Take a quick look at mpg before going farther.

Fundamentals

Let’s talk about geoms—geometric objects. They are the fundamental building blocks of ggplot2, and can be put into several categories. There are the basic geoms:

Geom Requires Understands Notes
geom_point() x, y color, size, … cf. geom_jitter()
geom_line() x, y color, size, linetype cf. geom_path()
geom_bar() x, y color, fill, … cf. geom_col()
geom_area() x or y (also min, max) color, fill cf. geom_ribbon()
Polygon geoms x, y color, size, linetype geom_rect(), geom_tile(), geom_polygon()
theme_update(title = element_text(size = 20, face = "bold"))
df <- data.frame(x = c(3, 1, 5), 
                 y = c(2, 4, 6))
base <- ggplot(df, aes(x, y)) +
  labs(x = NULL, y = NULL)

base + geom_point() + ggtitle("point")
base + geom_jitter() + ggtitle("jitter")
base + geom_line() + ggtitle("line")
base + geom_path() + ggtitle("path")
base + geom_bar(stat = "identity") + ggtitle("bar")
base + geom_tile() + ggtitle("tile")
base + geom_raster() + ggtitle("raster")
base + geom_polygon() + ggtitle("polygon")
base + geom_area() + ggtitle("area")

I would also call these basic, but they have additional aesthetic requirements: xmin, xmax, ymin, and ymax:

df <- data.frame(x = c(3, 1, 5), 
                 xmin = c(2, 0, 2),
                 xmax = c(4, 3, 5),
                 y = c(2, 4, 6),
                 ymin = c(1, 3, 2),
                 ymax = c(2.5, 5, 6))
base <- ggplot(df, aes(x, y, xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax)) +
  labs(x = NULL, y = NULL)

base + geom_rect(alpha = 0.5) + ggtitle("rect")
base + geom_ribbon() + ggtitle("ribbon")
base + geom_errorbar() + ggtitle("errorbar")
base + geom_pointrange() + ggtitle("pointrange")

Summarizing geoms

And then there are the basic statistical summary geoms:

base <- ggplot(mpg, aes(fill = class)) +
  labs(x = NULL, y = NULL) +
  theme(legend.position = "none")

base + geom_histogram(aes(x = displ)) + ggtitle("histogram")
base + geom_density(aes(x = displ)) + ggtitle("density")
base + geom_freqpoly(aes(x = displ)) + ggtitle("freqpoly")
base + geom_boxplot(aes(x = class, y = displ)) + ggtitle("boxplot")
base + geom_violin(aes(x = class, y = displ)) + ggtitle("violin")
base + geom_dotplot(aes(x = class), stackratio = 0.35) + ggtitle("dotplot")

Note that geoms can build on each other:

  • geom_histogram() uses geom_bar() but a different statistical summary, stat_bin()
  • geom_smooth() (not shown above) uses geom_line() and geom_ribbon()

Overplotting

Whether you use a ‘basic’ geom, or a statistical summary geom, depends on a number of factors…but one of them is simply data volume. When you have so many data points that they’re obscuring each other, it’s a problem—statistical relationship can be obscured, and the viewer can no longer see the data distribution(s).

At low to moderate data densities, try changing your plot symbol, size, or transparency:

# Example from https://ggplot2-book.org/statistical-summaries.html#overplotting
df <- data.frame(x = rnorm(4000), y = rnorm(4000))
norm <- ggplot(df, aes(x, y)) + 
  labs(x = NULL, y = NULL) 

norm + geom_point()
norm + geom_point(shape = 1) # Hollow circles
norm + geom_point(shape = ".") # Pixel sized
norm + geom_point(alpha = 0.1) # Pixel sized

At higher data densities you’ll need two-dimensional density geoms:

norm + geom_bin2d() + ggtitle("bin2d")
norm + geom_hex() + ggtitle("hex")
norm + geom_hex(bins = 10) + ggtitle("hex (bins = 10)")
norm + geom_density2d() + ggtitle("density2d")

theme_set(thm)

Position adjustments

Each geom has a position parameter that specifies how minor and/or group-wise position adjustments are to be handled.

theme_update(title = element_text(size = 20, face = "bold"))
base <- ggplot(mpg, aes(cty, fill = class)) + 
  xlab(NULL) + ylab(NULL) + theme(legend.position = "none")

base + geom_bar() + ggtitle("stack") # the default
base + geom_bar(position = "dodge") + ggtitle("dodge")
base + geom_bar(position = "fill") + ggtitle("fill")
base + geom_bar(position = "identity") + ggtitle("identity") +
  theme(panel.background = element_rect(fill = "gray"))

Some position adjustments are particularly useful for points:

base <- ggplot(mpg, aes(displ, cty, color = class)) + 
  xlab(NULL) + ylab(NULL) + theme(legend.position = "none")

base + geom_point() + ggtitle("identity") # the default
base + geom_point(position = position_nudge(x = 2, y = -2)) + ggtitle("nudge")
base + geom_point(position = "jitter") + ggtitle("jitter")
theme_set(thm)

Note that to change the adjustment defaults, you need to construct a position object with the desired parameters (see position_nudge() example above).

Geoms and stats

Every geom has a default stat, and every stat has a default geom.

Most stats take additional parameters to specify the details of statistical transformation; these are usually supplied in ... (in which case stat and geom parameters are automatically teased apart).

Geom defaults

The different behavior of different geoms can be seen from their parameter defaults:

Parameter geom_point geom_boxplot geom_histogram geom_abline
mapping NULL NULL NULL NULL
data NULL NULL NULL NULL
stat “identity” “boxplot” “bin” NULL
position “identity” “dodge2” “stack” NULL
show.legend NA NA NA NA
inherit.aes TRUE TRUE TRUE FALSE

Some questions to think about:

  • Why does geom_abline() not inherit aesthetics?
  • What’s the difference between geom_point() and geom_jitter()? (Look at the help.)
  • What’s the difference between geom_path() and geom_polygon()?
  • What about geom_path() versus geom_line()?

Changing the stat

Geoms have their stats, and stats have their geoms. But we can use alternate stats within geoms, and alternate geoms within stats:

theme_update(title = element_text(size = 20, face = "bold"))
base <- ggplot(mpg, aes(displ, hwy))
base + geom_point() + ggtitle("geom_point")
base + geom_point(stat = "ecdf") + 
  ggtitle('geom_point(stat = "ecdf")')
ggplot(mpg, aes(displ, hwy)) + stat_ecdf() + ggtitle("stat_ecdf")
ggplot(mpg, aes(displ, hwy)) + stat_ecdf(geom = "point") +
  ggtitle('stat_ecdf(geom = "point")')
theme_set(thm)

stat_summary()

We can also use stat_summary(), a useful (but not all-purpose, see below) tool.

For example, what if we’d like to plot data within groups, but also display the group mean? We could summarize the data separately, and then add second geom_point(data = ...) call, but there’s a slicker way.

ggplot(mpg, aes(class, hwy, color = displ)) + 
  geom_jitter() + 
  # stat_summary(fun = "mean", geom = "point", color = "red", size = 4)
  geom_point(stat = "summary", fun = "mean", color = "red", size = 4)

You might ask whether the summary point can be assigned a color corresponding to the mean of its group—i.e., whether we can use stat_summary to compute both mean(hwy) and mean(displ). As far as I know this is not possible; you would need to pre-compute a separate summary dataset:

mpg_agg <- aggregate(list(hwy = mpg$hwy, displ = mpg$displ),
                     by = list(class = mpg$class), mean)
ggplot(mpg, aes(class, hwy, color = displ)) + 
  geom_jitter() + 
  geom_point(data = mpg_agg, size = 4)

EXERCISE: stat_summary

In addition to using built-in functions like mean that return a single value, stat_summary can use a user-provided function that returns three values:

fun.data: A function that is given the complete data and should return a data frame with variables ymin, y, and ymax.

Starting from ggplot(mpg, aes(class, hwy, color = displ)), write a custom summary function to produce the following graph that shows each group’s median and standard deviation:

my_smry <- function(y) { 
  ysd <- sd(y, na.rm = TRUE)
  ymd <- median(y, na.rm = TRUE)
  data.frame(y = ymd, ymin = ymd - ysd, ymax = ymd + ysd)
}

ggplot(mpg, aes(class, hwy, color = displ)) + 
  geom_jitter() + 
  geom_pointrange(stat = "summary", fun.data = "my_smry", color = "red")

Finally, though, I tend not to use the explicit stat_* functionality of ggplot2 all that much; in general I find it simpler and clearer to pre-compute my summary statistics and then visualize them using the default geoms. YMMV however.

Scales

From the ggplot2 book:

Formally, each scale is a function from a region in data space (the domain of the scale) to a region in aesthetic space (the range of the scale). The axis or legend is the inverse function: it allows you to convert visual properties back to data.

  • Every aesthetic in a plot is associated with exactly one scale.
  • The default scale depends on the aesthetic and the variable type.
  • If you supply two scales for the same aesthetic, the last takes precedence.
  • Scale functions are of the form scale_<aesthetic>_<type of scale>()
  • There are continuous scales, discrete scales, and binned scales

Changing scales

Continuous and discrete scale functions have some parameters that are often used:

  • name - change the title of the axis or legend
  • breaks - tick locations (on an axis) or legend keys
  • labels - customize labels on ticks or legend keys
  • limits - change the scale range

NOTE that the limits argument will drop data outside of the new limits, because of where this happens in the ggplot2 pipeline; see above.

cars

base <- ggplot(cars, aes(speed, dist)) + geom_point()
base

name

base + scale_x_continuous(name = "New name")

breaks

base + scale_x_continuous(breaks = c(7, 12, 13, 18, 24))

labels

base + scale_x_continuous(labels = c("", "This", "isn't", "a", "good", "idea"))

limits

Notice the warning that occurs here. Why?

Because by default, ggplot2 converts data outside the scale limits to NA, which can affect subsequent stats and geoms. This behavior can be changed; see scales::oob.

base + scale_x_continuous(limits = c(10, 20))
## Warning: Removed 13 rows containing missing values (geom_point).

Scale labels

The example above used a manually specified vector of labels. We can also use a labelling function; the scales package provides some really useful ones:

library(scales)

label_dollar()(2004)
label_comma()(2004)
label_percent()(2004)
label_bytes()(2004)

ggplot(cars, aes(speed, dist, color = speed)) + 
  geom_point() +
  scale_x_continuous(labels = label_percent()) +
  scale_y_continuous(labels = label_dollar()) +
  scale_color_continuous(labels = label_ordinal())
## [1] "$2,004"
## [1] "2,004"
## [1] "200 400%"
## [1] "2 kB"

We can also write our own labeling function:

metric_label <- function(x) {
  # convert mpg to km/l
  round(x * 1.6 * 0.264172, 2)
}

ggplot(mpg, aes(displ, hwy)) + 
  geom_point() + 
  scale_y_continuous(labels = metric_label) +
  ylab("Km per liter")

Note that this only changes the labels on the lines. We could also have created a custom scale or a custom coordinate transform (see below).

Scale transformations

Continuous data are by default mapped linearly from the data space onto the aesthetic space, but this can be overridden using transformations via the trans argument:

Name Function \(f(x)\) Inverse \(f^{-1}(y)\)
exp \(e ^ x\) \(\log(y)\)
identity \(x\) \(y\)
log \(\log(x)\) \(e ^ y\)
log10 \(\log_{10}(x)\) \(10 ^ y\)
logit \(\log(\frac{x}{1 - x})\) \(\frac{1}{1 + e(y)}\)
pow10 \(10^x\) \(\log_{10}(y)\)
sqrt \(x^{1/2}\) \(y ^ 2\)

For many of these there are three ways to accomplish a scale-transformation task, e.g.:

  • scale_x_continuous(trans = "log10")
  • scale_x_log10()
  • aes(log10(x), ...)

These all do the same thing, transforming the data before any statistical summaries. Axis labeling may change however:

  • if you use a transformed scale, the axes will be labelled in the original data space
  • if you transform the data, the axes will be labelled in the transformed space.
# untransformed
theme_update(axis.text.y = element_text(size = 20, face = "bold"),
             title = element_text(size = 20, face = "bold"))
df <- data.frame(x = 1:50, y = 1:50)

ggplot(df, aes(x, y, group = 1)) + 
  geom_boxplot() + geom_point() + 
  ggtitle("untransformed")

# manual transformation
ggplot(df, aes(x, log10(y), group = 1)) + 
  geom_boxplot() + geom_point() + 
  ggtitle("aes(log10(y))")

# transform using scales
ggplot(df, aes(x, y, group = 1)) + 
  geom_boxplot() + geom_point() + 
  scale_y_continuous(trans = "log10") + 
  ggtitle('trans = "log10"')

# coordinate transformation
ggplot(df, aes(x, y, group = 1)) + 
  geom_boxplot() + geom_point() + 
  coord_trans(y = "log10") +
  ggtitle('coord_trans(y = "log10")') + drk
theme_set(thm)

Less common is to transform color or size scales. An example from the ggplot2 book:

base <- ggplot(faithfuld, aes(waiting, eruptions)) + 
  geom_raster(aes(fill = density)) + 
  scale_x_continuous(NULL, NULL, expand = c(0, 0)) +
  scale_y_continuous(NULL, NULL, expand = c(0, 0))

base
base + scale_fill_continuous(trans = "sqrt")

We can also create our own custom scale transformations, but this is beyond the scope of this workshop.

Discrete scales

Generally ggplot will automatically map categorical data to a discrete scale, and we don’t have to think further about it.

I didn’t know that you can selectively replace categorical labels:

ggplot(mpg, aes(x = hwy, y = class)) + 
  geom_point() + 
  scale_y_discrete(labels = c(midsize = "Ben's car"))

Binned scales

I did not know these existed! They do.

base <- ggplot(mpg, aes(hwy, class)) + geom_count()

base 
base + scale_x_binned(n.breaks = 10)

Reordering categories

By default ggplot2 puts categorical factors into alphabetical order, but you might want to change this; commonly, for example, we’d like to order by the value of some other variable.

This can be done in base R but the forcats package offers a super-simple method:

ggplot(mpg, aes(class, cty, color = class)) + 
  geom_jitter(show.legend = FALSE)

library(forcats)  # h/t SCP
ggplot(mpg, aes(fct_reorder(class, cty), cty, color = class)) + 
  geom_jitter(show.legend = FALSE)

Color

There are lots of ways to specify color; for example the HCL (hue-chroma-luminance) space.

# Example from http://sape.inf.usi.ch/quick-reference/ggplot2/colour
d <- expand.grid(h = seq(0, 350, 10), c = seq(0, 100, 5), l = seq(0, 100, 20))
ggplot(d) +
  coord_polar(theta = "x") + facet_wrap(~l) +
  scale_x_continuous(name = "hue", limits = c(0, 360), 
                     breaks = seq(5, 345, 20), labels = seq(0, 340, 20)) +
  scale_y_continuous(name = "chroma", breaks = seq(0, 100, 20)) +
  scale_fill_identity() +
  geom_rect(mapping = aes(xmin = h, 
                          xmax = h + resolution(h), 
                          ymin=c, 
                          ymax = c + resolution(c),  
                          fill = hcl(h, c, l)), 
            color = "white", size = 0.1)

To set a different continuous color scale:

  • scale_fill/color_gradient() produces a two-color gradient
  • scale_fill/color_gradient2() produces a three-color gradient with specified midpoint
  • scale_fill/color_gradientn() produces an n-color gradient
  • scale_fill/color_distiller() and scale_fill/color_brewer() provide color schemes from ColorBrewer
base <- ggplot(faithfuld, aes(waiting, eruptions, fill = density)) +
  geom_raster() +
  scale_x_continuous(NULL, expand = c(0, 0)) + 
  scale_y_continuous(NULL, expand = c(0, 0)) + 
  theme(legend.position = "none")

base
base + scale_fill_viridis_c()
base + scale_fill_viridis_c(option = "magma")
base + scale_fill_distiller(palette = "RdPu")
base + scale_fill_distiller(palette = "YlOrBr")
base + scale_fill_gradient2()

How do you know what colors to use? On a technical level, there are several useful tools:

  • Mac users have the “Digital Color Meter” in their Utilities folder; so handy!
  • R provides functions like hcl() that return the RGB (red-green-blue) hex code for a combination of hue, chroma, and luminance
  • The scales::show_col() provide “A quick and dirty way to show colours in a plot”
scales::show_col(hue_pal()(9))

More fundamentally, however, it’s time for…

Color crash course

“There are three fundamental use cases for color in data visualizations: (i) we can use color to distinguish groups of data from each other; (ii) we can use color to represent data values; and (iii) we can use color to highlight.” - Claus Wilke, Fundamentals of Data Visualization

The tl;dr of color theory is that color choices matter, and will create more impactful figures to communicate research more effectively. Colors opposite each other on the color star above are complementary and look good together. Why? Because SCIENCE. Photoreceptor cells in your eyes like balance.

1. Be consistent

From: How to Choose Colors for Your Data Visualizations by Michael Yi

Keep variables the same color throughout a presentation or paper. This helps viewers understand and follow the story of your data throughout.

2. Be intentional

The following example is from Claus Wilke’s Fundamentals of Data Visualization book (which I highly recommend) Why is this figure “bad”?

“Even if with a lot of effort we can figure out exactly which state is which, this visualization defeats the purpose of coloring. We should use color to enhance figures and make them easier to read, not to obscure the data by creating visual puzzles.”

Coloring the points by region offers an additional way to interpret the data. Don’t color for the sake of coloring. Color should have a clear purpose in a figure.

3. Be inclusive

Many people (~10% of men) do not possess the normal complement of color receptors and so can distinguish fewer colors than usual.

  • https://colorbrewer2.org/ allows you to filter through colorbrewer palettes and select colorblind safe, print friendly, and/or photocopy safe colors
  • viridis is a popular color palette package that is colorblind-friendly
  • The colorblindr and dichromat packages shows what your plot looks like in different color-vision-deficiency simulations

Guides

Legends and axes are collectively referred to as “guides” in the ggplot2 system, and are the inverse of scales: they map visual properties back to the underlying data.

The single most common Google search I perform is “ggplot remove legend”, so let’s get that out of the way:

  • Hide the entire legend: p + theme(legend.position = "none")
  • Hide the legend for a specific geometry: p + geom_xxx(show.legend = FALSE)

Legend positioning is changed via the theme system.

More generally, legends are complicated, both over and under the hood.

One useful trick to know about is the override.aes parameter of guide_legend().

base <- ggplot(mpg, aes(displ, hwy, color = class)) + 
  geom_point() +
  theme(legend.position = c(0.75, 0.75))

base
base + guides(colour = guide_legend(override.aes = list(size = 5)))

While working on this I wondered if the legend text entries could be colored to follow a plot aesthetic. Answer: it’s possible but a bit gnarly.

Coordinate systems

Most often we work with the default Cartesian coordinate system, coord_cartesian(). Two other common and useful systems are:

  • coord_flip() flips the x and y axes. Note that this is not the same thing as plotting y against x
  • coord_fixed() is the Cartesian system with a fixed aspect ratio

There are also specialized non-linear coordinate systems, in which the closest distance between two points may no longer be a straight line. These systems may thus change the very shape of geoms.

  • coord_map() and coord_sf() and for map projections
  • coord_polar() produces polar coordinates
  • coord_trans() allows you to apply arbitrary transformations to x and y
rect <- data.frame(x = 50, y = 50)
line <- data.frame(x = c(1, 200), y = c(100, 1))
base <- ggplot(mapping = aes(x, y)) + 
  geom_tile(data = rect, aes(width = 50, height = 50)) + 
  geom_line(data = line) + 
  xlab(NULL) + ylab(NULL)
base + ggtitle("Cartesian")
base + coord_flip() + ggtitle("coord_flip")
base + coord_polar("x") + ggtitle("Polar x")
base + coord_trans(y = "log10") + ggtitle('coord_trans(y = "log10")')

(Modified from an example in the ggplot2 book.)

EXERCISE: coordinate flipping

Is flipping the x and y coordinates the same thing as exchanging the x and y aesthetic mappings?

Use the cars dataset and geom_smooth() to test this.

p <- ggplot(cars, aes(speed, dist)) + geom_point() + geom_smooth()
p
p + coord_flip()
ggplot(cars, aes(dist, speed)) + geom_point() + geom_smooth() + drk

Coordinate limits

‘Zooming into’ a part of the plot is a coordinate transform, not a scale adjustment.

coord_cartesian(xlim = c(x1, x2))

coord_cartesian(ylim = c(y1, y2))

coord_cartesian(xlim = c(x1, x2), ylim = c(y1, y2))

theme_update(title = element_text(size = 20, face = "bold"))

p <- ggplot(mpg, aes(displ, cty)) + geom_point() + geom_smooth()
p + geom_vline(xintercept = 4, linetype = 2) + geom_vline(xintercept = 6, linetype = 2)
p + xlim(c(4, 6)) + drk + ggtitle("xlim")
p + scale_x_continuous(limits = c(4, 6)) + drk + ggtitle("scale limits")
p + coord_cartesian(xlim = c(4, 6)) + ggtitle("coord_cartesian")
theme_set(thm)

Annotations

Good annotations guide and clarify for your viewers, and you should know ggplot2’s annotate() function.

For example, we may want to annotate individual points:

ggplot(mpg, aes(displ, hwy)) +
  geom_point(data = subset(mpg, manufacturer == "subaru"), 
             colour = "orange",
             size = 3) +
  geom_point() + 
  annotate(geom = "curve", 
           x = 4, y = 35, xend = 2.65, yend = 27, 
           curvature = 0.3, arrow = arrow(length = unit(2, "mm"))) +
  annotate(geom = "text", x = 4.1, y = 35, label = "subaru", hjust = "left")

This can be very effective across facets:

ggplot(mpg, aes(displ, hwy)) +
  geom_point(data = mpg[c("displ", "hwy")], color = "grey") +
  geom_point(aes(colour = factor(cyl))) +
  facet_wrap(~cyl)

Shading can provide a very effective way to annotate data:

pres <- subset(presidential, start >= economics$date[1])

ggplot(economics) + 
  geom_rect(data = pres, 
            alpha = 0.3, 
            # notice the "-Inf" and "Inf" y limits here
            aes(xmin = start, xmax = end, ymin = -Inf, ymax = Inf, fill = party)) +
  geom_line(aes(date, unemploy / pop)) + 
  scale_fill_manual(values = c("blue", "red")) +
  scale_y_continuous(labels = label_percent()) +
  geom_text(data = pres, angle = 30, size = 3, hjust = 0, nudge_x = 50,
            aes(x = start, y = 0.025, label = name))

Other useful packages

The ggrepel package provides a useful variant on geom_text that attempts to space text labels out nicely:

base <- ggplot(mtcars,
  aes(wt, mpg, label = rownames(mtcars), colour = factor(cyl))) +
  geom_point() + 
  theme(legend.position = "none")

base + geom_text()
base + ggrepel::geom_text_repel()

We can also use the ggforce package to label entire groups:

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(color = class)) + 
  ggforce::geom_mark_ellipse(aes(label = cyl, group = cyl))

ggpmisc provide a geom_table() and (like the cowplot and patchwork packages) facilities for making inset tables and plots:

library(ggpmisc)
## 
## Attaching package: 'ggpmisc'
## The following object is masked from 'package:ggplot2':
## 
##     annotate
tb <- aggregate(list(hwy = mpg$hwy, cty = mpg$cty), by = list(cyl = mpg$cyl), median)
data.tb <- tibble::tibble(x = 6, y = 44, tb = list(tb))

ggplot(mpg, aes(displ, hwy, colour = factor(cyl))) +
  geom_table(data = data.tb, aes(x, y, label = tb)) +
  geom_point() 

Math

We can include arbitrary mathematical expression in almost any text in ggplot2: axis and legend titles, facet titles, legend entries. This uses R’s plotmath() facility and interprets a text argument as a mathematical expression.

This page has a handy list of plotmath examples; see also ?plotmath.

How this is implemented in ggplot2 can feel frustratingly arbitrary, and I find myself experimenting and googling a lot:

ggplot(mpg, aes(displ, hwy, color = class)) + geom_point() +
  # Axis labels will parse expressions by default
  xlab(expression(paste(Displacement~hat(x), " (in", phantom()^3, ")"))) +
  # Another way
  labs(y = quote(Highway~(mu))) +
  scale_color_discrete(labels = c(expression(alpha), 
                                  expression(beta), 
                                  expression(gamma)))

It is also possible to include Markdown in axis and legend titles using the ggtext package.

library(ggtext)
df <- data.frame(x = 1:3, y = 1:3)
base <- ggplot(df, aes(x, y)) + 
  geom_point() + 
  labs(x = "Axis title with *italics* and **boldface**") +
  theme(axis.title.x = element_text(size = 20))

base 
base + theme(axis.title.x = ggtext::element_markdown())

End

Things we didn’t talk about:

  • Maps, networks, 3D plots
  • Polar coordinates (much)
  • Assembling multiple plots together. My go-to is cowplot but the newer patchwork is worth checking out.

Resources