The Raincloud Plot is a visualization that produces a half-density to
a distribution plot. It gets the name because the density plot is in the
shape of a “raincloud”. The raincloud (half-density) plot
enhances the traditional box-plot by highlighting multiple modalities
(an indicator that groups may exist). The boxplot does not show where
densities are clustered, but the raincloud plot does! We’ll go through a
short tutorial to get you up and running with ggdist
to
make a raincloud plot.
ggdist
This tutorial showcases the awesome power of ggdist
for
visualizing distributions. The ggdist
package is a
ggplot2
extension that is made for visualizing
distributions and uncertainty. We’ll show see how ggdist
can be used to make a raincloud plot.
To load the libraries and data run the Following code.
#--- LIBRARIES ---
library(tidyverse)
library(tidyquant)
library(ggdist)
library(ggthemes)
#--- DATA ---
mpg
## # A tibble: 234 × 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
## 2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
## 3 audi a4 2 2008 4 manu… f 20 31 p comp…
## 4 audi a4 2 2008 4 auto… f 21 30 p comp…
## 5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
## 6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
## 7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
## 8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
## 9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
## 10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
## # … with 224 more rows
Next, we’ll make a Raincloud plot that highlights the distribution of Vehicle Fuel Economy (MPG) by Engine Size (Number of Cylinders).
The first step is to make the ggplot2
canvas. We:
filter()
to
isolate the most common (frequent) vehicle engine sizes.ggplot()
, we
map the cyl and hwy column. We also make a transformation to convert a
numeric cyl column to a discrete cyl column with
factor()
.mpg %>%
filter(cyl %in% c(4, 6, 8)) %>%
ggplot(aes(x = factor(cyl), y = hwy, fill = factor(cyl)))
## Error : The fig.showtext code chunk option must be TRUE
This produces a blank plot, which is the first layer. You can see that the x-axis is labeled “factor(cyl)” and the y-axis is “hwy” indicating the data has been mapped to the visualization.
stat_halfeye()
Next, we add our first geometry layer using
ggdist::stat_halfeye()
. This produces a Half Eye
visualization, which is contains a half-density and a slab-interval. We
remove the slab interval by setting .width = 0
and
point_colour = NA
. The half-density remains.
mpg %>%
filter(cyl %in% c(4, 6, 8)) %>%
ggplot(aes(x = factor(cyl), y = hwy, fill = factor(cyl))) +
# add half-violin from {ggdist} package
stat_halfeye(
# adjust bandwidth
adjust = 0.5,
# move to the right
justification = -0.2,
# remove the slub interval
.width = 0,
point_colour = NA
)
## Error : The fig.showtext code chunk option must be TRUE
And here’s the output. We can see the half-denisty distributions for fuel economy (hwy) by engine size (cyl)
geom_boxplot()
Next, add the second geometry layer using
ggplot2::geom_boxplot()
. This produces a narrow boxplot. We
reduce the width
and adjust the opacity.
mpg %>%
filter(cyl %in% c(4, 6, 8)) %>%
ggplot(aes(x = factor(cyl), y = hwy, fill = factor(cyl))) +
# add half-violin from {ggdist} package
stat_halfeye(
# adjust bandwidth
adjust = 0.5,
# move to the right
justification = -0.2,
# remove the slub interval
.width = 0,
point_colour = NA
) +
geom_boxplot(
width = 0.12,
# removing outliers
outlier.color = NA,
alpha = 0.5
)
## Error : The fig.showtext code chunk option must be TRUE
And here’s the output. We now have a boxplot and half-density. We can see how the distributions vary compared to the median and inner-quartile range.
stat_dots()
Next, add the third geometry layer using
ggdist::stat_dots()
. This produces a half-dotplot, which is
similar to a histogram that indicates the number of samples (number of
dots) in each bin. We select side = "left"
to indicate we
want it on the left-hand side.
mpg %>%
filter(cyl %in% c(4, 6, 8)) %>%
ggplot(aes(x = factor(cyl), y = hwy, fill = factor(cyl))) +
# add half-violin from {ggdist} package
stat_halfeye(
# adjust bandwidth
adjust = 0.5,
# move to the right
justification = -0.2,
# remove the slub interval
.width = 0,
point_colour = NA
) +
geom_boxplot(
width = 0.12,
# removing outliers
outlier.color = NA,
alpha = 0.5
) +
stat_dots(
# ploting on left side
side = "left",
# adjusting position
justification = 1.1,
# adjust grouping (binning) of observations
binwidth = 0.25
)
## Error : The fig.showtext code chunk option must be TRUE
And here’s the output. We now have the three main geometries completed.
We can clean up our plot with a professional-looking theme using
tidyquant::theme_tq()
. We’ll also rotate it with
coord_flip()
to give it the raincloud appearance.
mpg %>%
filter(cyl %in% c(4, 6, 8)) %>%
ggplot(aes(x = factor(cyl), y = hwy, fill = factor(cyl))) +
# add half-violin from {ggdist} package
stat_halfeye(
# adjust bandwidth
adjust = 0.5,
# move to the right
justification = -0.2,
# remove the slub interval
.width = 0,
point_colour = NA
) +
geom_boxplot(
width = 0.12,
# removing outliers
outlier.color = NA,
alpha = 0.5
) +
stat_dots(
# ploting on left side
side = "left",
# adjusting position
justification = 1.1,
# adjust grouping (binning) of observations
binwidth = 0.25
) +
# Themes and Labels
scale_fill_tq() +
theme_tq() +
labs(
title = "RainCloud Plot",
x = "Engine Size",
y = "Highway Fuel",
fill = "Cylinders"
) +
coord_flip()
## Error : The fig.showtext code chunk option must be TRUE
We’ve just finalized our plot.
mpg %>%
filter(cyl %in% c(4, 6, 8)) %>%
ggplot(aes(x = factor(cyl), y = hwy, fill = factor(cyl))) +
# add half-violin from {ggdist} package
stat_halfeye(
# adjust bandwidth
adjust = 0.5,
# move to the right
justification = -0.2,
# remove the slub interval
.width = 0,
point_colour = NA
) +
geom_boxplot(
width = 0.12,
# removing outliers
outlier.color = NA,
alpha = 0.5
) +
stat_dots(
# ploting on left side
side = "left",
# adjusting position
justification = 1.1,
# adjust grouping (binning) of observations
binwidth = 0.25
) +
# Themes and Labels
scale_fill_tableau("Tableau 20", name = NULL) +
labs(
title = "RainCloud Plot",
x = "Engine Size",
y = "Highway Fuel",
fill = "Cylinders"
) +
coord_flip()
## Error : The fig.showtext code chunk option must be TRUE
You can always customize every element of ggplot
.
We learned how to make Raincloud Plots with ggdist.
But,
there’s a lot more to visualiztion.
It’s critical to learn how to visualize with ggplot2
,
which is the premier framework for data visualization in R.
If you’d like to learn ggplot2
, data visualizations, and
data science for business with R, Stay Tuned with rana2hin.