Reference book - ggplot2: Elegant Graphics for Data Analysis

Laying out plots side by side

Often, one wants to show two or more plots side by side to show different aspects of the same story in a compelling way. This is the scenario that patchwork was build to solve. At its heart, patchwork is a package that extends ggplot2’s use of the + operator to work between multiple plots, as well as add additional operators for specialized compositions and working with compositions of plots.

As an example of the most basic use of patchwork, we’ll use the following 4 plots of the mpg dataset.

library(ggplot2)
p1 <- ggplot(mpg) + 
  geom_point(aes(x = displ, y = hwy))

p2 <- ggplot(mpg) + 
  geom_bar(aes(x = as.character(year), fill = drv), position = "dodge") + 
  labs(x = "year")

p3 <- ggplot(mpg) + 
  geom_density(aes(x = hwy, fill = drv), colour = NA) + 
  facet_grid(rows = vars(drv))

p4 <- ggplot(mpg) + 
  stat_summary(aes(x = drv, y = hwy, fill = drv), geom = "col", fun.data = mean_se) +
  stat_summary(aes(x = drv, y = hwy), geom = "errorbar", fun.data = mean_se, width = 0.5)

library(patchwork)
p1 + p2

+ does not specify any specific layout, only that the plots should be displayed together. In the absence of a layout the same algorithm that governs the number of rows and columns in facet_wrap() will decide the number of rows and columns. This means that adding 3 plots together will create a 1x3 grid while adding 4 plots together will create a 2x2 grid.

Motive of the Graphs: * Overall Purpose: To explore and compare characteristics of cars in the mpg dataset. Left plot: Shows the relationship between engine size and fuel efficiency. Right plot: Compares the number of car models by drive type across two different years. Why these plots together? They provide a quick overview of both continuous relationships (engine size vs. fuel efficiency) and categorical distributions (drive type by year) in the dataset. This helps in understanding trends in car design and technology over time, and how they relate to fuel efficiency and drivetrain choices.

Summary: The graphs help you understand how engine size affects fuel efficiency, and how the types of drivetrains offered by manufacturers have changed (or not) between 1999 and 2008.

p1 + p2 + p3 + p4

As can be seen from the two examples above, patchwork takes care of aligning the different parts of the plots with each other. You can see that all plotting regions are aligned, even in the presence of faceting. Further, you can see that the y-axis titles in the two left-most plots are aligned despite the axis text in the bottom left plot being wider.

Overall Motive of the Graphs: These four plots together provide a comprehensive overview of the relationships between engine size, fuel efficiency, drive type, and year in the mpg dataset: Scatter plot: Shows the general trend between engine size and fuel efficiency. Bar plot (year vs drv): Shows how the types of drivetrains are distributed across years. Density plots: Show how fuel efficiency is distributed within each drive type. Mean bar plot: Summarizes the average fuel efficiency for each drive type, with uncertainty. Purpose: To help you understand how car characteristics (engine size, drive type, year) relate to fuel efficiency, and how these characteristics are distributed in the dataset. This is useful for exploring trends, making comparisons, and generating hypotheses about what factors influence fuel economy in cars.

What are Error Bars? Error bars are graphical representations of the variability or uncertainty in your data. They show how precise a measurement is, or how much the values might vary if you repeated the experiment or sampling.

What do Error Bars Mean? In the context of your bar plot: The error bars on the bars for each drv (drive type) show the uncertainty around the mean highway miles per gallon (hwy) for that group. They usually represent: Standard error of the mean (SEM): How much the sample mean is expected to vary from the true population mean. Confidence intervals (often 95%): A range where we are fairly sure the true mean lies. Standard deviation: The spread of the data (less common for bar plots of means).

Why Use Error Bars? To show reliability: They help you see if differences between groups (e.g., drive types) are likely to be meaningful or just due to random variation. To visualize uncertainty: They remind us that the mean is just an estimate, not a perfect value. To compare groups: If error bars for two groups do not overlap much, it suggests a real difference between the groups.

How to Interpret Error Bars: Length of the Error Bar: A short error bar means the data points are close to the mean (less variability, more confidence in the mean). A long error bar means the data points are more spread out (more variability, less confidence in the mean). Comparing Groups: If the error bars of two groups do not overlap, it suggests a statistically significant difference between the group means. If the error bars overlap a lot, the difference between the means may not be statistically significant. What the Error Bar Represents: If it’s a standard error or confidence interval, it shows the uncertainty in the estimate of the mean. If it’s a standard deviation, it shows the spread of the data (but not uncertainty in the mean).

Taking control of the layout

It is often that the automatically created grid is not what you want and it is of course possible to control it. The most direct and powerful way is to do this is to add a plot_layout() specification to the plot:

p1 + p2 + p3 + plot_layout(ncol = 2)

A common scenario is wanting to force a single row or column. patchwork provides two operators, | and / respectively, to facilitate this (under the hood they simply set number of rows or columns in the layout to 1).

p1 / p2

# Basically the same as using `+` but the intend is clearer
p3 | p4

patchwork allows nesting layouts which means that it is possible to create some very intricate layouts using just these two operators

#Use | to arrange plots in a row. Use / to arrange plots in a column.
p3 | (p2 / (p1 | p4))

layout <- "
AAB
C#B
CDD
"

p1 + p2 + p3 + p4 + plot_layout(design = layout)

Is has been apparent in the last couple of plots the legend often becomes redundant between plots. While it is possible to remove the legend in all but one plot before assembling them, patchwork provides something easier for the common case:

What does the layout string mean?

Each character (A, B, C, D, #) represents a cell in a grid. Each letter (A, B, C, D) refers to a specific plot (e.g., p1, p2, p3, p4). The # symbol means an empty cell (no plot). The rows and columns are defined by the arrangement of characters.

Visual Representation: | | 1 | 2 | 3 | |—|—|—|—| | 1 | A | A | B | | 2 | C | # | B | | 3 | C | D | D | A occupies two cells in the first row. B occupies the top right and middle right. C occupies the left column in rows 2 and 3. D occupies the bottom right two cells. * # is an empty space (no plot).

What is the motive of this layout? Custom arrangement: It allows you to create complex, non-rectangular layouts for your plots, not just simple grids. Highlighting relationships: You can make some plots larger (by spanning multiple cells) or position them to emphasize relationships or importance. Professional presentation: Useful for making publication-quality figures where you want precise control over plot arrangement.

Why and when use it? When you need more than a simple grid: If you want some plots to be bigger, or to have empty spaces, or to arrange plots in a way that tells a better story. For composite figures: When preparing figures for papers, posters, or presentations, and you need to combine multiple plots in a visually appealing and informative way. To match journal requirements: Some journals require specific figure layouts.

p1 + p2 + p3 + plot_layout(ncol = 2, guides = "collect")

Electing to collect guides will take all guides and put them together at the position governed by the global theme. Further, it will remove any duplicate guide leaving only unique guides in the plot. The duplication detection looks at the appearance of the guide, and not the underlying scale it comes from. Thus, it will only remove guides that are exactly alike. If you want to optimize space use by putting guides in an empty area of the layout, you can specify a plotting area for collected guides:

p1 + p2 + p3 + guide_area() + plot_layout(ncol = 2, guides = "collect")

# Modifying subplots One of the tenets of patchwork is that the plots remain as standard ggplot objects until rendered. This means that they are amenable to modification after they have been assembled. The specific plots can by retrieved and set with [[]] indexing:

p12 <- p1 + p2
p12[[2]] <- p12[[2]] + theme_light()
p12

Often though, it is necessary to modify all subplots at once to e.g. give them a common theme. patchwork provides the & for this scenario:

p1 + p4 & theme_minimal()

This can also be used to give plots a common axis if they share the same aesthetic on that axis:

p1 + p4 & scale_y_continuous(limits = c(0, 45))

# Adding annotation

Once plots have been assembled, they start to form a single unit. This also means that titles, subtitles, and captions will often pertain to the full ensemble and not individual plots. Titles etc. can be added to patchwork plots using the plot_annotation() function.

p34 <- p3 + p4 + plot_annotation(
  title = "A closer look at the effect of drive train in cars",
  caption = "Source: mpg dataset in ggplot2"
)
p34

The titles formatted according to the theme specification in the plot_annotation() call.

p34 + plot_annotation(theme = theme_gray(base_family = "mono"))

As the global theme often follows the theme of the subplots, using & along with a theme object will modify the global theme as well as the themes of the subplots

p34 & theme_gray(base_family = "mono")

Another type of annotation, known especially in scientific literature, is to add tags to each subplot that will then be used to identify them in the text and caption. ggplot2 has the tag element for exactly this and patchwork offers functionality to set this automatically using the tag_levels argument. It can generate automatic levels in latin characters, arabic numerals, or roman numerals

p123 <- p1 | (p2 / p3)
p123 + plot_annotation(tag_levels = "I") # Uppercase roman numerics

An additional feature is that it is possible to use nesting to define new tagging levels:

p123[[2]] <- p123[[2]] + plot_layout(tag_level = "new")
p123 + plot_annotation(tag_levels = c("I", "a"))

As can be seen, patchwork offers a long range of possibilities when it comes to arranging plots, and the API scales with the level of complexity of the assembly, from simply using + to place multiple plots in the same area, to using nesting, layouts, and annotations to create advanced custom layouts.

Arranging plots on top of each other

While a lot of the functionality in patchwork is concerned with aligning plots in a grid, it also allows you to make insets, i.e. small plots placed on top of another plot. The functionality for this is wrapped in the inset_element() function which serves to mark the given plot as an inset to be placed on the preceding plot, along with recording the wanted placement etc. The basic usage is like this:

p1 + inset_element(p2, left = 0.5, bottom = 0.4, right = 0.9, top = 0.95)

The position is specified by given the left, right, top, and bottom location of the inset. The default is to use npc units which goes from 0 to 1 in the given area, but any grid::unit() can be used by giving them explicitly. The location is by default set to the panel area, but this can be changed with the align_to argument. Combining all this we can place an inset exactly 15 mm from the top right corner like this:

p1 + 
  inset_element(
    p2, 
    left = 0.4, 
    bottom = 0.4, 
    right = unit(1, "npc") - unit(15, "mm"), 
    top = unit(1, "npc") - unit(15, "mm"),
    align_to = "full"
  )

insets are not confined to ggplots. Any graphics supported by wrap_elements() can be used, including patchworks:

p24 <- p2 / p4 + plot_layout(guides = "collect")
p1 + inset_element(p24, left = 0.5, bottom = 0.05, right = 0.95, top = 0.9)

A nice feature of insets is that they behave as standard patchwork subplots until they are rendered. This means that they are amenable to modifications after assembly, e.g. using &:

p12 <- p1 + inset_element(p2, left = 0.5, bottom = 0.5, right = 0.9, top = 0.95)
p12 & theme_bw()

And auto tagging works as expected as well:

p12 + plot_annotation(tag_levels = "A")

https://patchwork.data-imaginist.com/

Arranging plots

Shefayet Hossain Monna

2025-05-13

Reference book - ggplot2: Elegant Graphics for Data Analysis

Laying out plots side by side

Taking control of the layout

What does the layout string mean?

Arranging plots on top of each other