For years, I avoided ggplot2 because I didn’t want to depend
on a third-party package for something as fundamental as plotting.
base graphics felt safer: built-in, permanent, blessed. I
invested heavily in mastering base plotting it and became
good at it. I knew my way around par(), margins, layout
gymnastics, layering hacks, and all the little incantations needed to
make complex base plots.
But eventually I learned something important:
ggplot2 is not just another package. It is a core pillar of modern R.
If you’re new to R today, you can safely skip mastering
base graphics (beyond the basics required for maintaining
legacy code).
Instead, spend your precious time learning ggplot2, which is not “just syntactic sugar.” It is an expressive system, literally — a grammar — for building correct, scale-aware, reusable visualizations.
The purpose of this post is to show a concrete example illustrating why.
We’ll walk through:
base vs
ggplot2)base vs
ggplot2)base: pseudo-log axisggplot2: true log coordinate transformationAlong the way, we’ll highlight the conceptual differences that make
ggplot2 fundamentally superior to base
plotting.
First, let’s create a sample dataset, which we will use for all examples.
library(tidyverse)
library(scales)
set.seed(123)
n_per_cell <- 80
f1_levels <- c("A", "B", "C")
f2_levels <- c("1", "2")
df <- expand_grid(
f1 = factor(f1_levels, levels = f1_levels),
f2 = factor(f2_levels, levels = f2_levels),
i = 1:n_per_cell
) |>
mutate(
x = runif(n(), 1, 100),
y = 0.5 * x +
as.numeric(f1) * 5 +
as.numeric(f2) * 3 +
rnorm(n(), sd = 10)
) |>
select(-i)
base scatter plotplot(df$x, df$y,
pch = 21, bg = "lightgreen",
main = "Example 1a: Base scatter plot",
xlab = "x", ylab = "y")
ggplot2 scatter plotggplot(df, aes(x, y)) +
geom_point(shape = 21, fill = "lightgreen") +
labs(
title = "Example 1b: ggplot2 scatter plot",
x = "x",
y = "y"
) +
theme_bw(base_size = 12)
base vs
ggplot2)Here, we’re going to construct a complex faceted plot with multiple
panels based on two factors (f1 and f2). Each
panel will include:
Look at all of the “blocking and tackling” we have to do in
base graphics to create this faceted plot with custom axes!
NOTE: Although this base code works, the way its
written, if you were to change the number of levels of each factor being
used to create the grid layout, there is a nasty bug waiting to bite you
in this base plot implementation. By contrast, the
corresponding ggplot2 code has no such problem. Down below,
I’ll explain what the bug is, but for now, see if you can figure it
out.
par(mfrow = c(2, 3),
mar = c(3.5, 3.5, 2.5, 1),
oma = c(4, 4, 4, 1),
cex.main = 0.9)
for (a in levels(df$f1)) {
for (b in levels(df$f2)) {
subdat <- subset(df, f1 == a & f2 == b)
xrange <- range(subdat$x)
yrange <- range(subdat$y)
plot(xrange, yrange,
type = "n",
xlab = "",
ylab = "",
main = paste("f1 =", a, ", f2 =", b),
cex.axis = 0.7)
points(subdat$x, subdat$y,
pch = 16, col = rgb(0, 0, 0, 0.4))
fit_lm <- lm(y ~ x, data = subdat)
abline(fit_lm, col = "red", lwd = 2)
fit_lo <- loess(y ~ x, data = subdat)
xs <- seq(min(subdat$x), max(subdat$x), length.out = 100)
lines(xs, predict(fit_lo, xs),
col = "blue", lwd = 2, lty = 2)
at_y <- pretty(subdat$y)
axis(2,
at = at_y,
labels = paste0(round(at_y, 1), " (",
round(at_y / max(at_y) * 100), "%)"),
cex.axis = 0.7)
}
}
mtext("Example 2a: Complex 'base' plot (linear x)",
side = 3, outer = TRUE, line = 1,
cex = 1.6, font = 2)
mtext("X label (linear scale)",
side = 1, outer = TRUE, line = 2.5,
cex = 1.2, font = 3)
mtext("Y label (percent-style)",
side = 2, outer = TRUE, line = 2.5,
cex = 1.2, font = 3)
Already, you can see how much simpler and more robust the
ggplot2 code is for creating the same faceted plot. Notice
that we don’t have to worry about setting up the layout manually or
handling axis labels for each panel; ggplot2 takes care of
all that for us. The ggplot2 code is also immune to the bug
mentioned earlier, because faceting is handled declaratively.
ggplot(df, aes(x, y)) +
geom_point(alpha = 0.4) +
geom_smooth(method = "lm", se = FALSE, color = "red") +
geom_smooth(method = "loess", se = FALSE, color = "blue", linetype = "dashed") +
scale_x_continuous(name = "X label (linear scale)") +
scale_y_continuous(
labels = label_percent(scale = 1),
name = "Y label (percent-style)"
) +
labs(title = "Example 2b: Complex 'ggplot2' plot (linear x)") +
theme_bw(base_size = 11) +
theme(
plot.title = element_text(face = "bold", size = 16),
axis.title.x = element_text(face = "italic", size = 12),
axis.title.y = element_text(face = "italic", size = 12)
) +
facet_grid(f1 ~ f2)
base
vs ggplot2)The plots we’re creating here are similar to Example 2, but now we
want the x-axis to be on a logarithmic scale. Here, you’re going to see
how the differences between base graphics and
ggplot2 become even more pronounced.
Notice that this code is very similar to Example 2a, but now we have to manually adjust the x-axis to simulate a logarithmic scale. This processinvolves calculating the tick marks and labels ourselves, which adds complexity and potential for error. Additionally, the same bug mentioned earlier is still present here.
par(mfrow = c(2, 3),
mar = c(3.5, 3.5, 2.5, 1),
oma = c(4, 4, 4, 1),
cex.main = 0.9)
for (a in levels(df$f1)) {
for (b in levels(df$f2)) {
subdat <- subset(df, f1 == a & f2 == b)
xrange <- range(subdat$x)
yrange <- range(subdat$y)
plot(xrange, yrange,
type = "n",
xlab = "",
ylab = "",
main = paste("f1 =", a, ", f2 =", b),
cex.axis = 0.7)
points(subdat$x, subdat$y,
pch = 16, col = rgb(0, 0, 0, 0.4))
fit_lm <- lm(y ~ x, data = subdat)
abline(fit_lm, col = "red", lwd = 2)
fit_lo <- loess(y ~ x, data = subdat)
xs <- seq(min(subdat$x), max(subdat$x), length.out = 100)
lines(xs, predict(fit_lo, xs),
col = "blue", lwd = 2, lty = 2)
at_x_log <- pretty(log10(subdat$x))
axis(1,
at = 10 ^ at_x_log,
labels = round(10 ^ at_x_log, 1),
cex.axis = 0.7)
at_y <- pretty(subdat$y)
axis(2,
at = at_y,
labels = paste0(round(at_y, 1), " (",
round(at_y / max(at_y) * 100), "%)"),
cex.axis = 0.7)
}
}
mtext("Example 3a: Complex 'base' plot with pseudo-log x",
side = 3, outer = TRUE, line = 1,
cex = 1.6, font = 2)
mtext("X label (pseudo-log scale)",
side = 1, outer = TRUE, line = 2.5,
cex = 1.2, font = 3)
mtext("Y label (percent-style)",
side = 2, outer = TRUE, line = 2.5,
cex = 1.2, font = 3)
By contrast, ggplot2 makes it straightforward to apply a
true logarithmic transformation to the x-axis using
coord_trans(). This approach ensures that all geometric
objects and statistics are computed correctly in the transformed space,
resulting in accurate and visually appealing plots. Also, the ggplot2
code remains immune to the bug mentioned earlier, thanks to its
declarative faceting system.
ggplot(df, aes(x, y)) +
geom_point(alpha = 0.4) +
geom_smooth(method = "lm", se = FALSE, color = "red") +
geom_smooth(method = "loess", se = FALSE, color = "blue", linetype = "dashed") +
scale_x_continuous(name = "X label (log10 scale)") +
scale_y_continuous(
labels = label_percent(scale = 1),
name = "Y label (percent-style)"
) +
labs(title = "Example 3b: Complex 'ggplot2' plot with true log-x") +
theme_bw(base_size = 11) +
theme(
plot.title = element_text(face = "bold", size = 16),
axis.title.x = element_text(face = "italic", size = 12),
axis.title.y = element_text(face = "italic", size = 12)
) +
coord_transform(x = "log10") +
facet_grid(f1 ~ f2)
Base graphics handles transformations by hacking the axis labels.
Also, to be able to create a lattice type layout, base
graphics requires manual layout management using
par(mfrow=...), which can lead to bugs if the number of
panels changes dynamically. Also, look at the nested for-loop structure
with the base graphics approach. Do you see how cumbersome
and error-prone this approach is compared to the declarative approach of
ggplot2, where core layout features are data-driven?
base graphics?In our examples, if we were to add or remove levels from
f1 or f2 (or if one just so happened to be
empty), the layout would break unless we manually adjusted the
mfrow parameters accordingly. This is because
base graphics relies on hard-coded layout specifications,
whereas ggplot2 automatically adapts to the data structure
when faceting.
ggplot2 handles them through a coherent grammar that
separates:
The result is a system that is
If you’re serious about data visualization in R, investing time in
learning ggplot2 is essential. Not only does
ggplot2 simplify the process of creating complex plots,
which gives you “more headspace” to think about your data instead of
plotting mechanics. It also ensures that your visualizations are
accurate and maintainable. By embracing ggplot2, you align
yourself with the modern R ecosystem, where reproducibility, clarity,
and expressiveness are paramount. So, ditch the old base
graphics habits and dive into the world of ggplot2. Your
future self (and your plots) will thank you!