Background

Recently, I made a Reddit post titled “The Myth That ‘Base R Does Everything ggplot2 Does’ Needs to Die.” that left a lot of Redditors triggered. That’s a good thing. A significant fraction of replies to that post claimed they’ve never heard anyone make such a ridiculous claim. Meanwhile, there were plenty of other comments claiming some version of the following.

  1. Your post doesn’t support your hypothesis.

  2. Base graphics are just as good as ggplot2 for data exploration (if not better).

Here, I address one of the comments from the “base-graphics-are-better” crowd.

Rebuttal to “Base R plotting is far superior for actual data analysis and exploration due to its simplicity”

“Ggplot obviously produces more aesthetically pleasing publication ready plots.”

Agreed.

“However, base R plotting is far superior for actual data analysis and exploration due to its simplicity.”

No. This statement isn’t even remotely true. If someone doesn’t know anything about ggplot2, I can see how they might believe this. However, the same can be said about base graphics, too.

“Take the example in the linked post. In base R a simple scatter plot uses intuitive formula notation: plot(x ~ y, data). In ggplot2 this would require typingggplot2(data, aes(x = x, y = y)) + geom_point().”

See Example 1 below.

“And function calls such as plot(model), biplot(pca) are even more convoluted to replicate in ggplot2.”

I disagree. I’d like to create an example to show why, but I won’t have time for a week or so. If you’ll ping me then, I’ll write an example to show you why.


Examples

Create a data set

First, let’s create a data set to use in the examples below.

library(tidyverse)
library(viridis)

set.seed(123)

n_per_cell <- 80
f1_levels  <- c("A", "B", "C")
f2_levels  <- c("1", "2")

df <- expand_grid(
  f1 = factor(f1_levels, levels = f1_levels),
  f2 = factor(f2_levels, levels = f2_levels),
  i  = 1:n_per_cell
) |>
  mutate(
    x = runif(n(), 1, 100),
    y = 0.5 * x +
        as.numeric(f1) * 5 +
        as.numeric(f2) * 3 +
        rnorm(n(), sd = 10)
  ) |>
  select(-i)

Example 1: Using Formula notation with base graphics and ggplot2 plotting

Example 1a using base graphics

plot(
  y ~ x,
  data = df,
  main = "Example 1a: Using base graphics to print\na Scatter plot of y vs. x, colored by f1, shaped by f2",
  bg = as.numeric(df$f1),
  pch = 20 + as.numeric(df$f2)
)
legend("topright", legend = levels(df$f1), col = 1:length(levels(df$f1)), pch = 16, title = "f1")
legend("topleft", legend = levels(df$f2), pch = 20 + 1:length(levels(df$f2)), title = "f2")

Example 1b using ggplot2

Create a ggplot2 wrapper function to mimic formula notation

First, let’s create a helper wrapper function for ggplot2 to mimic formula notation.

library(ggplot2)
library(rlang)

ggplot_formula <- function(data, formula, ...) {
  if (!inherits(formula, "formula") || length(formula) != 3L) {
    stop("`formula` must be of the form `y ~ x`.", call. = FALSE)
  }

  ggplot(
    data = data,
    mapping = aes(
      x = !!f_rhs(formula),
      y = !!f_lhs(formula),
      ...
    )
  )
}

Use the wrapper function to enable formula notation with ggplot2.

df |> 
  ggplot_formula(y ~ x, color = f1, fill = f1, shape = f2) +
  geom_point(size = 4) +
  theme_bw() +
  labs(
    title = "Example 1b: Using ggplot2 to print\na Scatter plot of y vs. x, colored by f1, shaped by f2"
  )