A multi-figure plot is often created to promote comparison of the distribution of one variable across values of another; usually, a factor or limited-range, discrete numeric variable. R provides multiple options for approaching this task. We’ll look at a few. Specifically, we’ll look at methods for iterating over levels() of a factor variable in base R, split()-ing the data frame, the facet_grid() and facet_wrap() functions from ggplot2, and trellis graphics via the “lattice” package.

We will use the “mtcars” dataset for our examples, representing the distribution of horsepower (“hp”) over different cylinder-counts (“cyl”).

# Load "mtcars" dataset
data(mtcars)

# Encode "cyl" and "am" as factors
mtcars[, "cyl"] <- factor(mtcars[, "cyl"])
mtcars[, "am"] <- factor(mtcars[, "am"], labels = c("automatic", "manual"))

Base R

For-loop

To start, we’ll simply loop over the levels() of “cyl”.

par(mfrow = c(1, 3))

for(i in levels(mtcars$cyl)) {
  hist(mtcars[mtcars$cyl == i, "hp"],
       xlab = "",
       col = "gray50",
       main = paste(i, "cylinders")) 
}

par(mfrow = c(1, 1))

# Add common x-axis label as subtitle
title(sub = "Horsepower (hp)", line = 4)

split()

Within base R, we’re really looking at different methods for passing the relevant subsets of the data - here, “hp” according to “cyl” - to hist(). The split() function outputs a list of vectors, with an initial vector, “x,” divided into groups according to a variable “f.” We can then draw from the resulting list - of horsepowers for a given level of “cyl” - to produce our plots.

Either of the above methods - for-loop or -apply() - would work here; iterating over elements of a list is relatively straightforward. Instead, we’ll call hist() three separate times. Although this means we’ll repeat many arguments, we’ll also have more control over the appearance.

# Create the split() object
split_hps <- split(x = mtcars$hp, f = mtcars$cyl)

# Over the entire display area, we'll create three adaject figure regions.
# We'll also shrink the left and right margins between figures.
par(fig = c(0, .333, 0, 1), mar = c(5,4,3,0.5))

# cyl == "4" histogram
hist(x = split_hps[[1]], xlab = "", main = "", ylab = "Frequency",
     las = 1, col = "gray50")

# Rather than an x-axis label, we'll use mtext() - 'margin text' - to locate a label
# above the plot area.
mtext(text = "4 cylinder", side = 3)

# Create 2nd figure region
par(fig = c(.333, 0.666, 0, 1), mar = c(5,0.1,3,0.5), new = TRUE)

# cyl == "6" histogram, axes and labels suppressed
hist(x = split_hps[[2]], ann = FALSE, axes = FALSE, col = "gray50")

# Move x-axis above plot area
axis(3)

# Add subtitles via title():  one for an overall title, located at bottom; one for a plot
# label.
title(sub = "Horsepower (hp)", outer = FALSE, line = 3)
title(sub = "6 cylinders", outer = FALSE, line = 0)

# Strip features from y-axis
axis(2, lwd.ticks = 0, labels = FALSE)

# Create 3rd figure region
par(fig = c(0.666, 1, 0, 1), mar = c(5, 0.1, 3, 2), new = TRUE)

# cyl == "8" histogram, axes and labels suppressed
hist(x = split_hps[[3]], ann = FALSE, axes = FALSE, col = "gray50")

# Add x-axis
axis(1)

# Strip features from y-axis
axis(2, lwd.ticks = 0, labels = FALSE)

# Add plot label via mtext()
mtext(text = "8 cylinders", side = 3)

“ggplot2” package

The “ggplot2” package includes functions for ‘facetting’ variables; creating and laying out panels with a variable plotted per value of one or more variables. Here, again, we’ll use “cyl” as the facetting variable for histograms of “hp”. Note that we’ve passed to “bins” the default method hist() uses for determining the number of breaks, minus 1. Additionally, we specify theme_bw().

Facetting

For reference, “ggplot2” defines facet_grid() and facet_wrap() functions. The primary difference is that facet_grid() will create all panels, even if, for a given value or value combination, the panel is empty. facet_wrap() will create only those panels with values to be plotted.

library(ggplot2)

ggplot(data = mtcars, aes(x = hp)) +
  geom_histogram(colour = "gray50", bins = nclass.Sturges(mtcars$hp) - 1) +
  labs(x = "Horsepower (hp)", y = "Frequency") +
  facet_grid( ~ cyl, scales = "free") +
  theme_bw()

position = “dodge”

Alternatively, with “ggplot2,” we can create fill-differentiated histograms, representing a given numeric across levels of a factor variables on a common axis. Rather than specifying facets, we ‘group’ and ‘fill’ according to “cyl” and specify “postion”. The “mtcars” includes limited observations over a relatively large range of horsepower, leaving us with very sparse histograms when we use a common axis for all groups. The code for hp ~ cyl plots are included below, but are commented-out. Instead, we’ll look to the “C02” dataset, also from the “datasets” package, and plot CO2 uptake as a function of “Treatment,” i.e. whether a specimen has been “chilled” or “nonchilled,” uptake ~ Treatment.

# ggplot(data = mtcars, aes(x = hp, group = am, fill = am)) +
#  geom_histogram(position = "dodge") +
#  labs(x = "Horsepower (hp)", y = "Frequency", fill = "Transmission") +
#  theme_bw() +
#  theme(legend.position = "bottom")

data(CO2)

ggplot(data = CO2, aes(x = uptake, group = Type, fill = Type)) +
  geom_histogram(position = "dodge", bins = 15) +
  labs(x = expression(paste(CO[2], " uptake (", mu, "mol/", m^{2}, ")")),
       y = "Frequency", fill = "Origin") +
  theme_bw()

position = “identity”

Rather than “dodging” the bins so that corresponding counts are plotted next to one another, we could fill-differentiate according to factor value, but plot each count in the same place. Bars will overlap, but we can pass an “alpha” greater than 0 so that the bars are more or less transparent.

# ggplot(data = mtcars, aes(x = hp, group = am, fill = am)) +
#   geom_histogram(position = "identity", alpha = 0.5) +
#   labs(x = "Horsepower (hp)", y = "Frequency", fill = "Transmission") +
#   theme_bw() +
#   theme(legend.position = "bottom")

ggplot(data = CO2, aes(x = uptake, group = Type, fill = Type)) +
  geom_histogram(position = "identity", bins = 15, alpha = 0.5) +
  labs(x = expression(paste(CO[2], " uptake (", mu, "mol/", m^{2}, ")")),
       y = "Frequency", fill = "Origin") +
  theme_bw()

“lattice” package

Lastly, the “lattice” package provides an implementation of Trellis Graphics in R. “lattice,” to me, is well-suited to multi-panel, comparative plots; not dissimilar in default appearance to ggplot2’s ‘facetting.’

library(lattice)

histogram( ~ hp | cyl, data = mtcars,
           type = "count",
           # breaks = nclass.Sturges(mtcars$hp),
           # scales = "free",
           xlab = "Horsepower (hp)",
           ylab = "Frequency")