Asked by someone to recreate a relatively straightforward bar chart in R, I realized there were a few features of the original that I didn’t know how to map, specify or otherwise get to using ggplot2. Specifically, the original plot: * Considered numeric values across two (2) categorical variables: one (1) mapped to the bars, and one (1) mapped to fill, representing (common) components of the numeric sums, * Ordered the bars by decreasing sum, and * Used a sphere, rather than the default filled square, for the legend symbol.

Other than these, matching the original was just a matter of finding close-enough colors and specifying a handful of theme elements.

First, we’ll create our data object:

library(dplyr)
library(ggplot2)

category_1 <- rep(c("A", "B", "C", "D"), each = 25)
set.seed(9999)
category_2 <- sample(x = LETTERS[23:26], size = 100, replace = TRUE)
numeric_1 <- c(runif(n = 25, min = 1, max = 3.4),
               runif(n = 25, min = 0.5, max = 2.),
               runif(n = 25, min = 1.25, max = 4.2),
               runif(n = 25, min = 1.5, max = 4L))

df <- data.frame(category_1, category_2, numeric_1)
glimpse(df)
## Observations: 100
## Variables: 3
## $ category_1 <fct> A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, ...
## $ category_2 <fct> Z, Y, Z, W, Y, Z, W, Z, Z, Y, Z, W, W, Z, X, Z, Z, ...
## $ numeric_1  <dbl> 2.964633, 2.989781, 1.867876, 2.446609, 2.962878, 2...

Second, we’ll use aggregate() to sum “numeric_1” per “category_1” - “category_2” pair.

df <- aggregate(numeric_1 ~ category_1 + category_2, data = df, FUN = sum)
glimpse(df)
## Observations: 16
## Variables: 3
## $ category_1 <fct> A, B, C, D, A, B, C, D, A, B, C, D, A, B, C, D
## $ category_2 <fct> W, W, W, W, X, X, X, X, Y, Y, Y, Y, Z, Z, Z, Z
## $ numeric_1  <dbl> 14.192245, 3.987331, 14.960232, 26.786882, 5.530439...

Third, our plot will have a (stacked) bar for each unique value of “category_1”, with each segment of each bar representing the summed numeric values per “category_1” - “category_2” value pair.

To start:

ggplot(data = df, aes(x = category_1, y = numeric_1, fill = category_2)) +
  geom_bar(stat = "identity")


Next, we’ll reorder() the levels of “category_1” according to the “numeric_1” sums. Then, we’ll (re-) factor “category_1” so the levels (i.e. bars) will be plotted in decreasing order, left-to-right.

df$category_1 <- reorder(df$category_1, df$numeric_1, FUN = sum)
df$category_1 <- factor(df$category_1, levels = rev(levels(df$category_1)))

ggplot(data = df, aes(x = category_1, y = numeric_1, fill = category_2)) +
  geom_bar(stat = "identity")


To control the legend symbol, we need to create a “dummy” data object from which we can get a suitable legend, but which doesn’t result in values plottted. Such a data frame is created below. Note that in our graphic code we suppress the geom_bar() legend, and leave the geom_point() legend expressed. Note also that we get a warning message related to the ‘removed’ rows; i.e. those from “df_dummy” missing values.

df_dummy <- data.frame(category_1 = as.numeric(rep(NA, nrow(df))),
                       category_2 = df$category_2,
                       numeric_1 = as.numeric(rep(NA, nrow(df))))

ggplot(data = df, aes(x = category_1, y = numeric_1, fill = category_2)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  geom_point(data = df_dummy,
             aes(x = category_1, y = numeric_1, color = category_2), size = 5)
## Warning: Removed 16 rows containing missing values (geom_point).


Lastly, we’ll specify some thematic and feature elements to match the original visualization. Apologies, but it’s not included here.

Notably, we want a narrow gap between each bar segment. We accomplish this by specifying a ‘white’ border for geom_bar(). We specify appropriate colors to match the original’s palette and a ‘blank’ panel background, border and legend title. We shift the legend upward. We put the axes and tick marks in light gray and the axis text in black. We also narrow the bars themselves and dynamically specify the y-axis upper limit.

ggplot(data = df, aes(y = numeric_1, x = category_1, fill = category_2)) +
  geom_bar(stat = "identity", width = 0.7, colour = "white", show.legend = FALSE) +
  labs(x = "", y = "") +
  geom_point(data = df_dummy, aes(x = category_1, y = numeric_1,
                                  color = category_2), size = 5) +
  theme(axis.line = element_line(colour = "grey70"),
        axis.text = element_text(colour = "black"),
        axis.ticks = element_line(colour = "grey70"),
        legend.key = element_blank(),
        legend.position = "right",
        legend.box.margin = margin(0, 0, 230, 0),
        legend.title = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank()) +
  scale_fill_manual(breaks = c("W", "X", "Y", "Z"),
                    values = c("dodgerblue1", "springgreen3", "firebrick2", "gold")) +
  scale_colour_manual(values = c("dodgerblue1", "springgreen3", "firebrick2", "gold")) +
  scale_y_continuous(breaks = seq(from = 0,
                                  to = max(aggregate(df$numeric_1 ~ df$category_1,
                                                 FUN = sum)[, 2], by = 10),
                                  by = 10))