In his book “The Visual Display of Quantitative Information” Edward R. Tufte establishes the term “data ink”. He defines “data ink” as the “non-erasable core of a graphic, the non-redundant ink arranged in response to variation in the numbers represented”. Conversely, non-data-ink are all those parts of a graphic, that are not directly related to its core information. He also points out that some ink may technically be “data ink” but redundant. For example, many barplots which already include an axis with labeled tick marks also state the heights of the bars explicitly with additional pieces of text above the bars. Tufte advocates for keeping the following principles in mind, when creating plots and visualizations of data:

In his book, Tufte goes over many different types of plots and illustrates how they can be simplified to satisfy these principles. In this post, we will consider three types of plots: scatter plots, bar plots and box plots. We will first have look at how these plots are created by the base R plot commands, i.e. plot(), barplot() and boxplot(), when the user does not play around with any of their numerous parameters and arguments. Then, we will try to tweak these plots to make them look like in Tufte’s book. Lastly, I will add what I think is the sweet spot between the two.

Scatter plots

Default look

set.seed(1234)
n = 50
x = 1:n + rnorm(n, 0, 10)
y = 1:n + rnorm(n, 0, 10)

plot(
  x = x,
  y = y
)

Tufte

plot(
  x = x,
  y = y,
  pch = 16,
  ann = FALSE,
  axes = FALSE,
  frame.plot = FALSE
)
axis(
  side = 1,
  at = range(x),
  lwd.tick = 0,
  labels = FALSE
)
axis(
  side = 2,
  at = range(y),
  lwd.tick = 0,
  labels = FALSE
)

Sweet spot

plot(
  x = x,
  y = y,
  pch = 16,
  xlim = c(-20, 60),
  ylim = c(-20, 80),
  ann = FALSE,
  axes = FALSE,
  frame.plot = FALSE
)
axis(
  side = 1,
  las = 1,
  at = seq(-20, 60, 20)
)
axis(
  side = 2,
  las = 1,
  at = seq(-20, 80, 20)
)

Bar plots

Default look

set.seed(1234)
n <- 10
x <- runif(n, 0, 100)
barnames <- LETTERS[1:n]

barplot(
  height = x,
  names.arg = barnames
)

Tufte

barplot(
  height = x,
  names.arg = barnames,
  border = NA,
  axes = FALSE
)
grid(
  nx = 0,
  ny = NULL,
  col = "white",
  lty = 1,
  lwd = 2
)
axis(
  side = 2,
  lty = 0,
  las = 1
)

Sweet spot

barplot(
  height = x,
  names.arg = barnames,
  border = NA,
  axes = FALSE
)
axis(
  side = 2,
  las = 1
)

Box plots

Default look

set.seed(1234)
n <- 100
dat <- data.frame(
  "A" = runif(n, 0, 100),
  "B" = rchisq(n, 20),
  "C" = rnorm(n, 50, 20),
  "D" = rlogis(n, 50, 10),
  "E" = rlnorm(n, 3, .5),
  "F" = runif(n, 30, 90),
  "G" = rnorm(n, 70, 10)
)

boxplot(dat)

Tufte

boxplot(
  dat,
  las = 1,
  axes = FALSE,
  frame.plot = FALSE,
  pars = list(
    whisklty = 1,       # whisker: line type 
    staplelty = 0,      # staple: line type
    boxcol = "white",   # box: border color
    boxfill = "white",  # box: fill color
    medlty = 0,         # median: line type
    medpch = 16,        # median: point type
    outpch = 1,         # outlier: point type
    outcex = 0.7        # outlier: point size
  )
)
axis(
  side = 1,
  lty = 0, 
  at = 1:ncol(dat),
  labels = colnames(dat)
)
axis(
  side = 2,
  lty = 0,
  las = 1,
  at = c(0:5)*20
)

Sweet spot

reorder <- order(sapply(dat, median))
boxplot(
  dat[reorder],
  axes = FALSE,
  frame.plot = FALSE,
  pars = list(
    whisklty = 1,       # whisker: line type 
    staplelty = 0,      # staple: line type
    boxfill = "white",  # box: fill color
    boxwex = 0.3,       # box: width (default = 0.8)
    medlwd = 1,         # median: line width
    outpch = 1,         # outlier: point type
    outcex = 0.7        # outlier: point size
  )
)
axis(
  side = 1,
  lty = 0, 
  at = 1:ncol(dat),
  labels = colnames(dat)[reorder]
)
axis(
  side = 2,
  las = 1,
  at = c(0:5)*20
)

Sources

Tufte, Edward R., 2001, “The Visual Display of Quantitative Information”, Graphics Press, Cheshire, Connecticut.