Introduction

For years, I avoided ggplot2 because I didn’t want to depend on a third-party package for something as fundamental as plotting. base graphics felt safer: built-in, permanent, blessed. I invested heavily in mastering base plotting it and became good at it. I knew my way around par(), margins, layout gymnastics, layering hacks, and all the little incantations needed to make complex base plots.

But eventually I learned something important:

ggplot2 is not just another package. It is a core pillar of modern R.

If you’re new to R today, you can safely skip mastering base graphics (beyond the basics required for maintaining legacy code).

Instead, spend your precious time learning ggplot2, which is not “just syntactic sugar.” It is an expressive system, literally — a grammar — for building correct, scale-aware, reusable visualizations.

The purpose of this post is to show a concrete example illustrating why.

We’ll walk through:

  1. A simple scatter plot (base vs ggplot2)
  2. A complex faceted plot on a linear x-axis (base vs ggplot2)
  3. A complex faceted plot on a log-scaled x-axis
    • base: pseudo-log axis
    • ggplot2: true log coordinate transformation

Along the way, we’ll highlight the conceptual differences that make ggplot2 fundamentally superior to base plotting.


Setup

First, let’s create a sample dataset, which we will use for all examples.

library(tidyverse)
library(scales)

set.seed(123)

n_per_cell <- 80
f1_levels  <- c("A", "B", "C")
f2_levels  <- c("1", "2")

df <- expand_grid(
  f1 = factor(f1_levels, levels = f1_levels),
  f2 = factor(f2_levels, levels = f2_levels),
  i  = 1:n_per_cell
) |>
  mutate(
    x = runif(n(), 1, 100),
    y = 0.5 * x +
        as.numeric(f1) * 5 +
        as.numeric(f2) * 3 +
        rnorm(n(), sd = 10)
  ) |>
  select(-i)

Example 1: Simple scatter (base vs ggplot2)

Example 1a: base scatter plot

plot(df$x, df$y,
     pch = 21, bg = "lightgreen",
     main = "Example 1a: Base scatter plot",
     xlab = "x", ylab = "y")

Example 1b: ggplot2 scatter plot

ggplot(df, aes(x, y)) +
  geom_point(shape = 21, fill = "lightgreen") +
  labs(
    title = "Example 1b: ggplot2 scatter plot",
    x = "x",
    y = "y"
  ) +
  theme_bw(base_size = 12)


Example 2: Complex faceted plot (base vs ggplot2)

Here, we’re going to construct a complex faceted plot with multiple panels based on two factors (f1 and f2). Each panel will include:

Example 2a: Complex base plot (linear x)

Look at all of the “blocking and tackling” we have to do in base graphics to create this faceted plot with custom axes! NOTE: Although this base code works, the way its written, if you were to change the number of levels of each factor being used to create the grid layout, there is a nasty bug waiting to bite you in this base plot implementation. By contrast, the corresponding ggplot2 code has no such problem. Down below, I’ll explain what the bug is, but for now, see if you can figure it out.

par(mfrow = c(2, 3),
    mar   = c(3.5, 3.5, 2.5, 1),
    oma   = c(4, 4, 4, 1),
    cex.main = 0.9)

for (a in levels(df$f1)) {
  for (b in levels(df$f2)) {
    subdat <- subset(df, f1 == a & f2 == b)

    xrange <- range(subdat$x)
    yrange <- range(subdat$y)

    plot(xrange, yrange,
         type = "n",
         xlab = "",
         ylab = "",
         main = paste("f1 =", a, ", f2 =", b),
         cex.axis = 0.7)

    points(subdat$x, subdat$y,
           pch = 16, col = rgb(0, 0, 0, 0.4))

    fit_lm <- lm(y ~ x, data = subdat)
    abline(fit_lm, col = "red", lwd = 2)

    fit_lo <- loess(y ~ x, data = subdat)
    xs <- seq(min(subdat$x), max(subdat$x), length.out = 100)
    lines(xs, predict(fit_lo, xs),
          col = "blue", lwd = 2, lty = 2)

    at_y <- pretty(subdat$y)
    axis(2,
         at     = at_y,
         labels = paste0(round(at_y, 1), " (",
                         round(at_y / max(at_y) * 100), "%)"),
         cex.axis = 0.7)
  }
}

mtext("Example 2a: Complex 'base' plot (linear x)",
      side = 3, outer = TRUE, line = 1,
      cex = 1.6, font = 2)

mtext("X label (linear scale)",
      side = 1, outer = TRUE, line = 2.5,
      cex = 1.2, font = 3)

mtext("Y label (percent-style)",
      side = 2, outer = TRUE, line = 2.5,
      cex = 1.2, font = 3)


Example 2b: Complex ggplot2 plot (linear x)

Already, you can see how much simpler and more robust the ggplot2 code is for creating the same faceted plot. Notice that we don’t have to worry about setting up the layout manually or handling axis labels for each panel; ggplot2 takes care of all that for us. The ggplot2 code is also immune to the bug mentioned earlier, because faceting is handled declaratively.

ggplot(df, aes(x, y)) +
  geom_point(alpha = 0.4) +
  geom_smooth(method = "lm",    se = FALSE, color = "red") +
  geom_smooth(method = "loess", se = FALSE, color = "blue", linetype = "dashed") +
  scale_x_continuous(name = "X label (linear scale)") +
  scale_y_continuous(
    labels = label_percent(scale = 1),
    name   = "Y label (percent-style)"
  ) +
  labs(title = "Example 2b: Complex 'ggplot2' plot (linear x)") +
  theme_bw(base_size = 11) +
  theme(
    plot.title   = element_text(face = "bold",  size = 16),
    axis.title.x = element_text(face = "italic", size = 12),
    axis.title.y = element_text(face = "italic", size = 12)
  ) +
  facet_grid(f1 ~ f2)


Example 3: Complex faceted plot with log x-axis (base vs ggplot2)

The plots we’re creating here are similar to Example 2, but now we want the x-axis to be on a logarithmic scale. Here, you’re going to see how the differences between base graphics and ggplot2 become even more pronounced.

Example 3a: Complex base plot (pseudo-log x)

Notice that this code is very similar to Example 2a, but now we have to manually adjust the x-axis to simulate a logarithmic scale. This processinvolves calculating the tick marks and labels ourselves, which adds complexity and potential for error. Additionally, the same bug mentioned earlier is still present here.

par(mfrow = c(2, 3),
    mar   = c(3.5, 3.5, 2.5, 1),
    oma   = c(4, 4, 4, 1),
    cex.main = 0.9)

for (a in levels(df$f1)) {
  for (b in levels(df$f2)) {
    subdat <- subset(df, f1 == a & f2 == b)

    xrange <- range(subdat$x)
    yrange <- range(subdat$y)

    plot(xrange, yrange,
         type = "n",
         xlab = "",
         ylab = "",
         main = paste("f1 =", a, ", f2 =", b),
         cex.axis = 0.7)

    points(subdat$x, subdat$y,
           pch = 16, col = rgb(0, 0, 0, 0.4))

    fit_lm <- lm(y ~ x, data = subdat)
    abline(fit_lm, col = "red", lwd = 2)

    fit_lo <- loess(y ~ x, data = subdat)
    xs <- seq(min(subdat$x), max(subdat$x), length.out = 100)
    lines(xs, predict(fit_lo, xs),
          col = "blue", lwd = 2, lty = 2)

    at_x_log <- pretty(log10(subdat$x))
    axis(1,
         at     = 10 ^ at_x_log,
         labels = round(10 ^ at_x_log, 1),
         cex.axis = 0.7)

    at_y <- pretty(subdat$y)
    axis(2,
         at     = at_y,
         labels = paste0(round(at_y, 1), " (",
                         round(at_y / max(at_y) * 100), "%)"),
         cex.axis = 0.7)
  }
}

mtext("Example 3a: Complex 'base' plot with pseudo-log x",
      side = 3, outer = TRUE, line = 1,
      cex = 1.6, font = 2)

mtext("X label (pseudo-log scale)",
      side = 1, outer = TRUE, line = 2.5,
      cex = 1.2, font = 3)

mtext("Y label (percent-style)",
      side = 2, outer = TRUE, line = 2.5,
      cex = 1.2, font = 3)


Example 3b: Complex ggplot2 plot with true log-x (coord_trans)

By contrast, ggplot2 makes it straightforward to apply a true logarithmic transformation to the x-axis using coord_trans(). This approach ensures that all geometric objects and statistics are computed correctly in the transformed space, resulting in accurate and visually appealing plots. Also, the ggplot2 code remains immune to the bug mentioned earlier, thanks to its declarative faceting system.

ggplot(df, aes(x, y)) +
  geom_point(alpha = 0.4) +
  geom_smooth(method = "lm",    se = FALSE, color = "red") +
  geom_smooth(method = "loess", se = FALSE, color = "blue", linetype = "dashed") +
  scale_x_continuous(name = "X label (log10 scale)") +
  scale_y_continuous(
    labels = label_percent(scale = 1),
    name   = "Y label (percent-style)"
  ) +
  labs(title = "Example 3b: Complex 'ggplot2' plot with true log-x") +
  theme_bw(base_size = 11) +
  theme(
    plot.title   = element_text(face = "bold",  size = 16),
    axis.title.x = element_text(face = "italic", size = 12),
    axis.title.y = element_text(face = "italic", size = 12)
  ) +
  coord_transform(x = "log10") +
  facet_grid(f1 ~ f2)


Discussion

Base graphics handles transformations by hacking the axis labels. Also, to be able to create a lattice type layout, base graphics requires manual layout management using par(mfrow=...), which can lead to bugs if the number of panels changes dynamically. Also, look at the nested for-loop structure with the base graphics approach. Do you see how cumbersome and error-prone this approach is compared to the declarative approach of ggplot2, where core layout features are data-driven?

Where’s the potential bug in base graphics?

In our examples, if we were to add or remove levels from f1 or f2 (or if one just so happened to be empty), the layout would break unless we manually adjusted the mfrow parameters accordingly. This is because base graphics relies on hard-coded layout specifications, whereas ggplot2 automatically adapts to the data structure when faceting.

ggplot2 handles them through a coherent grammar that separates:

The result is a system that is

  1. more predictable,
  2. more correct,
  3. easier to understand, and
  4. vastly more expressive.

Conclusion

If you’re serious about data visualization in R, investing time in learning ggplot2 is essential. Not only does ggplot2 simplify the process of creating complex plots, which gives you “more headspace” to think about your data instead of plotting mechanics. It also ensures that your visualizations are accurate and maintainable. By embracing ggplot2, you align yourself with the modern R ecosystem, where reproducibility, clarity, and expressiveness are paramount. So, ditch the old base graphics habits and dive into the world of ggplot2. Your future self (and your plots) will thank you!