What is a Raincloud Plot?

The Raincloud Plot is a visualization that produces a half-density to a distribution plot. It gets the name because the density plot is in the shape of a “raincloud”. The raincloud (half-density) plot enhances the traditional box-plot by highlighting multiple modalities (an indicator that groups may exist). The boxplot does not show where densities are clustered, but the raincloud plot does! We’ll go through a short tutorial to get you up and running with ggdist to make a raincloud plot.

Raincloud Plots with ggdist

This tutorial showcases the awesome power of ggdist for visualizing distributions. The ggdist package is a ggplot2 extension that is made for visualizing distributions and uncertainty. We’ll show see how ggdist can be used to make a raincloud plot.

Load the Libraries and Data

To load the libraries and data run the Following code.

#--- LIBRARIES ---

library(tidyverse)
library(tidyquant)
library(ggdist)
library(ggthemes)

#--- DATA ---

mpg
## # A tibble: 234 × 11
##    manufacturer model      displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr>      <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 audi         a4           1.8  1999     4 auto… f        18    29 p     comp…
##  2 audi         a4           1.8  1999     4 manu… f        21    29 p     comp…
##  3 audi         a4           2    2008     4 manu… f        20    31 p     comp…
##  4 audi         a4           2    2008     4 auto… f        21    30 p     comp…
##  5 audi         a4           2.8  1999     6 auto… f        16    26 p     comp…
##  6 audi         a4           2.8  1999     6 manu… f        18    26 p     comp…
##  7 audi         a4           3.1  2008     6 auto… f        18    27 p     comp…
##  8 audi         a4 quattro   1.8  1999     4 manu… 4        18    26 p     comp…
##  9 audi         a4 quattro   1.8  1999     4 auto… 4        16    25 p     comp…
## 10 audi         a4 quattro   2    2008     4 manu… 4        20    28 p     comp…
## # … with 224 more rows

Raincloud Plot: Using ggplot

Next, we’ll make a Raincloud plot that highlights the distribution of Vehicle Fuel Economy (MPG) by Engine Size (Number of Cylinders).

Make the ggplot2 canvas

The first step is to make the ggplot2 canvas. We:

  1. Prepare the Data: Using filter() to isolate the most common (frequent) vehicle engine sizes.
  2. Map the columns: Using ggplot(), we map the cyl and hwy column. We also make a transformation to convert a numeric cyl column to a discrete cyl column with factor().
mpg %>% 
  filter(cyl %in% c(4, 6, 8)) %>% 
  ggplot(aes(x = factor(cyl), y = hwy, fill = factor(cyl)))
## Error : The fig.showtext code chunk option must be TRUE

This produces a blank plot, which is the first layer. You can see that the x-axis is labeled “factor(cyl)” and the y-axis is “hwy” indicating the data has been mapped to the visualization.

Add the Rainclouds with stat_halfeye()

Next, we add our first geometry layer using ggdist::stat_halfeye(). This produces a Half Eye visualization, which is contains a half-density and a slab-interval. We remove the slab interval by setting .width = 0 and point_colour = NA. The half-density remains.

mpg %>% 
  filter(cyl %in% c(4, 6, 8)) %>% 
  ggplot(aes(x = factor(cyl), y = hwy, fill = factor(cyl))) +
  
  # add half-violin from {ggdist} package
  stat_halfeye(
    # adjust bandwidth
    adjust = 0.5,
    # move to the right
    justification = -0.2,
    # remove the slub interval
    .width = 0,
    point_colour = NA
  )
## Error : The fig.showtext code chunk option must be TRUE

And here’s the output. We can see the half-denisty distributions for fuel economy (hwy) by engine size (cyl)

Add the Boxplot with geom_boxplot()

Next, add the second geometry layer using ggplot2::geom_boxplot(). This produces a narrow boxplot. We reduce the width and adjust the opacity.

mpg %>% 
  filter(cyl %in% c(4, 6, 8)) %>% 
  ggplot(aes(x = factor(cyl), y = hwy, fill = factor(cyl))) +
  
  # add half-violin from {ggdist} package
  stat_halfeye(
    # adjust bandwidth
    adjust = 0.5,
    # move to the right
    justification = -0.2,
    # remove the slub interval
    .width = 0,
    point_colour = NA
  ) +
  geom_boxplot(
    width = 0.12,
    # removing outliers
    outlier.color = NA,
    alpha = 0.5
  )
## Error : The fig.showtext code chunk option must be TRUE

And here’s the output. We now have a boxplot and half-density. We can see how the distributions vary compared to the median and inner-quartile range.

Add the Dot Plots with stat_dots()

Next, add the third geometry layer using ggdist::stat_dots(). This produces a half-dotplot, which is similar to a histogram that indicates the number of samples (number of dots) in each bin. We select side = "left" to indicate we want it on the left-hand side.

mpg %>% 
  filter(cyl %in% c(4, 6, 8)) %>% 
  ggplot(aes(x = factor(cyl), y = hwy, fill = factor(cyl))) +
  
  # add half-violin from {ggdist} package
  stat_halfeye(
    # adjust bandwidth
    adjust = 0.5,
    # move to the right
    justification = -0.2,
    # remove the slub interval
    .width = 0,
    point_colour = NA
  ) +
  geom_boxplot(
    width = 0.12,
    # removing outliers
    outlier.color = NA,
    alpha = 0.5
  ) +
  stat_dots(
    # ploting on left side
    side = "left",
    # adjusting position
    justification = 1.1,
    # adjust grouping (binning) of observations
    binwidth = 0.25
  )
## Error : The fig.showtext code chunk option must be TRUE

And here’s the output. We now have the three main geometries completed.

Making the plot look professional

We can clean up our plot with a professional-looking theme using tidyquant::theme_tq(). We’ll also rotate it with coord_flip() to give it the raincloud appearance.

Theme_tq

mpg %>% 
  filter(cyl %in% c(4, 6, 8)) %>% 
  ggplot(aes(x = factor(cyl), y = hwy, fill = factor(cyl))) +
  
  # add half-violin from {ggdist} package
  stat_halfeye(
    # adjust bandwidth
    adjust = 0.5,
    # move to the right
    justification = -0.2,
    # remove the slub interval
    .width = 0,
    point_colour = NA
  ) +
  geom_boxplot(
    width = 0.12,
    # removing outliers
    outlier.color = NA,
    alpha = 0.5
  ) +
  stat_dots(
    # ploting on left side
    side = "left",
    # adjusting position
    justification = 1.1,
    # adjust grouping (binning) of observations
    binwidth = 0.25
  ) +
  # Themes and Labels
  scale_fill_tq() +
  theme_tq() +
  labs(
    title = "RainCloud Plot",
    x = "Engine Size",
    y = "Highway Fuel",
    fill = "Cylinders"
  ) +
  coord_flip()
## Error : The fig.showtext code chunk option must be TRUE

We’ve just finalized our plot.

Theme Tableau

mpg %>% 
  filter(cyl %in% c(4, 6, 8)) %>% 
  ggplot(aes(x = factor(cyl), y = hwy, fill = factor(cyl))) +
  
  # add half-violin from {ggdist} package
  stat_halfeye(
    # adjust bandwidth
    adjust = 0.5,
    # move to the right
    justification = -0.2,
    # remove the slub interval
    .width = 0,
    point_colour = NA
  ) +
  geom_boxplot(
    width = 0.12,
    # removing outliers
    outlier.color = NA,
    alpha = 0.5
  ) +
  stat_dots(
    # ploting on left side
    side = "left",
    # adjusting position
    justification = 1.1,
    # adjust grouping (binning) of observations
    binwidth = 0.25
  ) +
  # Themes and Labels
  scale_fill_tableau("Tableau 20", name = NULL) +
  labs(
    title = "RainCloud Plot",
    x = "Engine Size",
    y = "Highway Fuel",
    fill = "Cylinders"
  ) +
  coord_flip()
## Error : The fig.showtext code chunk option must be TRUE

You can always customize every element of ggplot.

Summary

We learned how to make Raincloud Plots with ggdist. But, there’s a lot more to visualiztion.

It’s critical to learn how to visualize with ggplot2, which is the premier framework for data visualization in R.

If you’d like to learn ggplot2, data visualizations, and data science for business with R, Stay Tuned with rana2hin.