The “ggplot2” packages defines functions, facet_grid() and facet_wrap(), that let us easily create multi-figure displays around a “facetting” variable. This is convenient for evaluating the distribution or behavior of a variable or variables across values of one or more other variables.
Here, we’ll consider “Petal.Length” by “Species” from the “iris” dataset.
library(ggplot2)
data(iris)
a <- ggplot(data = iris, aes(x = Petal.Length)) + geom_histogram(bins = 15, alpha = 0.75) +
labs(y = "Frequency") + facet_grid(. ~ Species) +
theme_bw()
a
Mapping, thematic and other elements are automatically shared across figures, but we can also add figure-specific features. To the histogram above, we’ll add a dashed, red line indicating the median “Petal.Length” per “Species.”
To do this, we need to do two (2) things:
# Create a data frame with the levels() of Species - our facetting variable - and the
# median Petal.Lengths:
vline_df <- data.frame(Species = levels(iris$Species),
Medians = tapply(X = iris$Petal.Length, INDEX = iris$Species,
FUN = median))
# Add to
a + geom_vline(data = vline_df, aes(xintercept = Medians), linetype = "dashed",
colour = "red4")
“ggplot2” does not - as far as I know - have a built-in function for adding a “theoretical” line to a QQ plot; i.e. an equivalent to the base R qqline(). However, we can add such a line by passing an appropriate slope and intercept to geom_abline().
# First, we'll create a value|type data frame for our example displays:
x <- c(rnorm(n = 250, mean = 0, sd = 0.5),
rnorm(n = 250, mean = 0, sd = 1),
rnorm(n = 250, mean = 0, sd = 2),
rnorm(n = 250, mean = 0, sd = 3))
df <- data.frame(x,
y = rep(c("A", "B", "C", "D"), each = 250))
# Second, we'll calculate the slope and intercept of the theoretical quantile-quantile line:
a <- quantile(df$x, probs = c(0.25, 0.75))
b <- qnorm(c(0.25, 0.75))
slope <- diff(a)/diff(b)
int <- a[1L] - slope * b[1L]
# Now, we'll create a QQ plot, including all "x," undifferentiated by "y":
ggplot(data = df, aes(sample = x)) + stat_qq() +
geom_abline(slope = slope, intercept = int, linetype = "dashed", colour = "red4") +
theme_bw()
To add facet-specific lines, we need to do the same as we did for the iris per-Species medians: define a data frame with the relevant levels and values. Here, our levels are those of our facetting variable, “y,” and the relevant values are the slopes and intercepts calculated from “x,” per “y.”
# First, we need to calculate the slopes and intercepts. We'll do this with a for-loop,
# iterating across the levels of "y":
slopes <- numeric()
ints <- numeric()
for(i in levels(factor(df$y))) {
slopes[i] <- diff(quantile(df[df$y == i, 1],
probs = c(0.25, 0.75))) / diff(qnorm(c(0.25, 0.75)))
ints[i] <- quantile(df[df$y == i, 1],
probs = c(0.25)) - slopes[i] * qnorm(c(0.25))
}
# Second, we'll create a data frame for our facet-specific geom_abline()s:
ab_df <- data.frame(y = as.character(levels(df$y)),
a = slopes,
b = ints)
# Finally, our plot:
ggplot(data = df, aes(sample = x)) + stat_qq() +
geom_abline(data = ab_df, aes(slope = a, intercept = b), linetype = "dashed",
colour = "red4") +
facet_grid(. ~ y)
# NOTE: for geom_abline(), we must pass our abline data frame and must
# map the slope and intercept columns - i.e inside aes() - not just
# pass to "slope" and "intercept" arguments.
Lastly, we can also do facet-specific labels, if, for example, the levels of our facetting variable aren’t particularly clear or informative, but we don’t want to modify the underlying data object.
# We need to order our desired labels to match the order of the variable levels:
labels <- paste0(levels(df$y), ", sd = ", round(tapply(df$x, df$y, sd), digits = 3))
# We also need to name the labels according to the levels of our facetted variable:
y_labels <- c(A = labels[1], B = labels[2], C = labels[3], D = labels[4])
# We pass our named vector of labels
ggplot(data = df, aes(sample = x)) + stat_qq() +
geom_abline(data = ab_df, aes(slope = a, intercept = b), linetype = "dashed",
colour = "red4") +
facet_grid(. ~ y, labeller = labeller(y = y_labels))