Recently, I made a Reddit post titled “The Myth That ‘Base R Does Everything ggplot2 Does’ Needs to Die.” that left a lot of Redditors triggered. That’s a good thing. A significant fraction of replies to that post claimed they’ve never heard anyone make such a ridiculous claim. Meanwhile, there were plenty of other comments claiming some version of the following.
Your post doesn’t support your hypothesis.
Base graphics are just as good as ggplot2 for data exploration (if not better).
Here, I address one of the comments from the “base-graphics-are-better” crowd.
“Ggplot obviously produces more aesthetically pleasing publication ready plots.”
Agreed.
“However, base R plotting is far superior for actual data analysis and exploration due to its simplicity.”
No. This statement isn’t even remotely true. If someone doesn’t know
anything about ggplot2, I can see how they might believe
this. However, the same can be said about base graphics,
too.
“Take the example in the linked post. In base R a simple scatter plot uses intuitive formula notation: plot(x ~ y, data). In ggplot2 this would require typingggplot2(data, aes(x = x, y = y)) + geom_point().”
See Example 1 below.
“And function calls such as plot(model), biplot(pca) are even more convoluted to replicate in ggplot2.”
I disagree. I’d like to create an example to show why, but I won’t have time for a week or so. If you’ll ping me then, I’ll write an example to show you why.
First, let’s create a data set to use in the examples below.
library(tidyverse)
library(viridis)
set.seed(123)
n_per_cell <- 80
f1_levels <- c("A", "B", "C")
f2_levels <- c("1", "2")
df <- expand_grid(
f1 = factor(f1_levels, levels = f1_levels),
f2 = factor(f2_levels, levels = f2_levels),
i = 1:n_per_cell
) |>
mutate(
x = runif(n(), 1, 100),
y = 0.5 * x +
as.numeric(f1) * 5 +
as.numeric(f2) * 3 +
rnorm(n(), sd = 10)
) |>
select(-i)
plot(
y ~ x,
data = df,
main = "Example 1a: Using base graphics to print\na Scatter plot of y vs. x, colored by f1, shaped by f2",
bg = as.numeric(df$f1),
pch = 20 + as.numeric(df$f2)
)
legend("topright", legend = levels(df$f1), col = 1:length(levels(df$f1)), pch = 16, title = "f1")
legend("topleft", legend = levels(df$f2), pch = 20 + 1:length(levels(df$f2)), title = "f2")
ggplot2 wrapper function to mimic formula
notationFirst, let’s create a helper wrapper function for ggplot2 to mimic formula notation.
library(ggplot2)
library(rlang)
ggplot_formula <- function(data, formula, ...) {
if (!inherits(formula, "formula") || length(formula) != 3L) {
stop("`formula` must be of the form `y ~ x`.", call. = FALSE)
}
ggplot(
data = data,
mapping = aes(
x = !!f_rhs(formula),
y = !!f_lhs(formula),
...
)
)
}
df |>
ggplot_formula(y ~ x, color = f1, fill = f1, shape = f2) +
geom_point(size = 4) +
theme_bw() +
labs(
title = "Example 1b: Using ggplot2 to print\na Scatter plot of y vs. x, colored by f1, shaped by f2"
)