First we need to place each of our observations on the same level: in this case, that means giving cut and clarity ratings separate rows. This can be done using tidyr’s gather function:

library(ggplot2)
library(dplyr)
library(tidyr)

diamonds_gathered <- diamonds %>% gather(metric, level, -carat)
## Warning: attributes are not identical across measure variables; they will
## be dropped
head(diamonds_gathered)
##   carat metric     level
## 1  0.23    cut     Ideal
## 2  0.21    cut   Premium
## 3  0.23    cut      Good
## 4  0.29    cut   Premium
## 5  0.31    cut      Good
## 6  0.24    cut Very Good

Right now every metric appears in the data. We need to filter only for color and clarity. dplyr’s filter function is a natural way to do this:

diamonds_cut_clarity <- diamonds_gathered %>% filter(metric %in% c("cut", "clarity"))

Now we can compare the carat distribution between cut and clarity groups, and facet to distinguish the two metrics:

ggplot(diamonds_cut_clarity, aes(x = level, y = carat)) + geom_boxplot() +
    scale_y_log10() + facet_wrap(~ metric)

plot of chunk unnamed-chunk-3

This graph is far from perfect. Notably, there are a bunch of empty spaces on each x-axis, because it forces each facet to have the same x-axis. We can fix this with an argument to facet_wrap, scales = "free_x"

ggplot(diamonds_cut_clarity, aes(x = level, y = carat)) + geom_boxplot() +
    scale_y_log10() + facet_wrap(~ metric, scales = "free_x")

plot of chunk unnamed-chunk-4

This graph is OK, but has on relevant flaw: each boxplot is in alphabetical order rather than ordering by quality within clarity and cut. Our most complicated step involves fixing that by assigning an explicit ordering to the level column:

lvls <- c(levels(diamonds$clarity), levels(diamonds$cut))
lvls  # this is just the levels in order, one after the other:
##  [1] "I1"        "SI2"       "SI1"       "VS2"       "VS1"      
##  [6] "VVS2"      "VVS1"      "IF"        "Fair"      "Good"     
## [11] "Very Good" "Premium"   "Ideal"
diamonds_cut_clarity$level <- factor(diamonds_cut_clarity$level, levels = lvls)

ggplot(diamonds_cut_clarity, aes(x = level, y = carat)) + geom_boxplot() +
    scale_y_log10() + facet_wrap(~ metric, scales="free_x")

plot of chunk unnamed-chunk-5

This gets us the plot we were looking for. Reiterated and combined, the full code would look like:

diamonds_cut_clarity <- diamonds %>% gather(metric, level, -carat) %>%
    filter(metric %in% c("cut", "clarity"))
## Warning: attributes are not identical across measure variables; they will
## be dropped
lvls <- c(levels(diamonds$clarity), levels(diamonds$cut))
diamonds_cut_clarity$level <- factor(diamonds_cut_clarity$level, levels = lvls)

ggplot(diamonds_cut_clarity, aes(x = level, y = carat)) + geom_boxplot() +
    scale_y_log10() + facet_wrap(~ metric, scales="free_x")

plot of chunk unnamed-chunk-6