The Raincloud Plot is a visualization that produces a half-density to
a distribution plot. It gets the name because the density plot is in the
shape of a “raincloud”. The raincloud (half-density) plot
enhances the traditional box-plot by highlighting multiple modalities
(an indicator that groups may exist). The boxplot does not show where
densities are clustered, but the raincloud plot does! We’ll go through a
short tutorial to get you up and running with ggdist to
make a raincloud plot.
ggdistThis tutorial showcases the awesome power of ggdist for
visualizing distributions. The ggdist package is a
ggplot2 extension that is made for visualizing
distributions and uncertainty. We’ll show see how ggdist
can be used to make a raincloud plot.
To load the libraries and data run the Following code.
#--- LIBRARIES ---
library(tidyverse)
library(tidyquant)
library(ggdist)
library(ggthemes)
#--- DATA ---
mpg
## # A tibble: 234 × 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
## 2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
## 3 audi a4 2 2008 4 manu… f 20 31 p comp…
## 4 audi a4 2 2008 4 auto… f 21 30 p comp…
## 5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
## 6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
## 7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
## 8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
## 9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
## 10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
## # … with 224 more rows
Next, we’ll make a Raincloud plot that highlights the distribution of Vehicle Fuel Economy (MPG) by Engine Size (Number of Cylinders).
The first step is to make the ggplot2 canvas. We:
filter() to
isolate the most common (frequent) vehicle engine sizes.ggplot(), we
map the cyl and hwy column. We also make a transformation to convert a
numeric cyl column to a discrete cyl column with
factor().mpg %>%
filter(cyl %in% c(4, 6, 8)) %>%
ggplot(aes(x = factor(cyl), y = hwy, fill = factor(cyl)))
## Error : The fig.showtext code chunk option must be TRUE
This produces a blank plot, which is the first layer. You can see that the x-axis is labeled “factor(cyl)” and the y-axis is “hwy” indicating the data has been mapped to the visualization.
stat_halfeye()Next, we add our first geometry layer using
ggdist::stat_halfeye(). This produces a Half Eye
visualization, which is contains a half-density and a slab-interval. We
remove the slab interval by setting .width = 0 and
point_colour = NA. The half-density remains.
mpg %>%
filter(cyl %in% c(4, 6, 8)) %>%
ggplot(aes(x = factor(cyl), y = hwy, fill = factor(cyl))) +
# add half-violin from {ggdist} package
stat_halfeye(
# adjust bandwidth
adjust = 0.5,
# move to the right
justification = -0.2,
# remove the slub interval
.width = 0,
point_colour = NA
)
## Error : The fig.showtext code chunk option must be TRUE
And here’s the output. We can see the half-denisty distributions for fuel economy (hwy) by engine size (cyl)
geom_boxplot()Next, add the second geometry layer using
ggplot2::geom_boxplot(). This produces a narrow boxplot. We
reduce the width and adjust the opacity.
mpg %>%
filter(cyl %in% c(4, 6, 8)) %>%
ggplot(aes(x = factor(cyl), y = hwy, fill = factor(cyl))) +
# add half-violin from {ggdist} package
stat_halfeye(
# adjust bandwidth
adjust = 0.5,
# move to the right
justification = -0.2,
# remove the slub interval
.width = 0,
point_colour = NA
) +
geom_boxplot(
width = 0.12,
# removing outliers
outlier.color = NA,
alpha = 0.5
)
## Error : The fig.showtext code chunk option must be TRUE
And here’s the output. We now have a boxplot and half-density. We can see how the distributions vary compared to the median and inner-quartile range.
stat_dots()Next, add the third geometry layer using
ggdist::stat_dots(). This produces a half-dotplot, which is
similar to a histogram that indicates the number of samples (number of
dots) in each bin. We select side = "left" to indicate we
want it on the left-hand side.
mpg %>%
filter(cyl %in% c(4, 6, 8)) %>%
ggplot(aes(x = factor(cyl), y = hwy, fill = factor(cyl))) +
# add half-violin from {ggdist} package
stat_halfeye(
# adjust bandwidth
adjust = 0.5,
# move to the right
justification = -0.2,
# remove the slub interval
.width = 0,
point_colour = NA
) +
geom_boxplot(
width = 0.12,
# removing outliers
outlier.color = NA,
alpha = 0.5
) +
stat_dots(
# ploting on left side
side = "left",
# adjusting position
justification = 1.1,
# adjust grouping (binning) of observations
binwidth = 0.25
)
## Error : The fig.showtext code chunk option must be TRUE
And here’s the output. We now have the three main geometries completed.
We can clean up our plot with a professional-looking theme using
tidyquant::theme_tq(). We’ll also rotate it with
coord_flip() to give it the raincloud appearance.
mpg %>%
filter(cyl %in% c(4, 6, 8)) %>%
ggplot(aes(x = factor(cyl), y = hwy, fill = factor(cyl))) +
# add half-violin from {ggdist} package
stat_halfeye(
# adjust bandwidth
adjust = 0.5,
# move to the right
justification = -0.2,
# remove the slub interval
.width = 0,
point_colour = NA
) +
geom_boxplot(
width = 0.12,
# removing outliers
outlier.color = NA,
alpha = 0.5
) +
stat_dots(
# ploting on left side
side = "left",
# adjusting position
justification = 1.1,
# adjust grouping (binning) of observations
binwidth = 0.25
) +
# Themes and Labels
scale_fill_tq() +
theme_tq() +
labs(
title = "RainCloud Plot",
x = "Engine Size",
y = "Highway Fuel",
fill = "Cylinders"
) +
coord_flip()
## Error : The fig.showtext code chunk option must be TRUE
We’ve just finalized our plot.
mpg %>%
filter(cyl %in% c(4, 6, 8)) %>%
ggplot(aes(x = factor(cyl), y = hwy, fill = factor(cyl))) +
# add half-violin from {ggdist} package
stat_halfeye(
# adjust bandwidth
adjust = 0.5,
# move to the right
justification = -0.2,
# remove the slub interval
.width = 0,
point_colour = NA
) +
geom_boxplot(
width = 0.12,
# removing outliers
outlier.color = NA,
alpha = 0.5
) +
stat_dots(
# ploting on left side
side = "left",
# adjusting position
justification = 1.1,
# adjust grouping (binning) of observations
binwidth = 0.25
) +
# Themes and Labels
scale_fill_tableau("Tableau 20", name = NULL) +
labs(
title = "RainCloud Plot",
x = "Engine Size",
y = "Highway Fuel",
fill = "Cylinders"
) +
coord_flip()
## Error : The fig.showtext code chunk option must be TRUE
You can always customize every element of ggplot.
We learned how to make Raincloud Plots with ggdist. But,
there’s a lot more to visualiztion.
It’s critical to learn how to visualize with ggplot2,
which is the premier framework for data visualization in R.
If you’d like to learn ggplot2, data visualizations, and
data science for business with R, Stay Tuned with rana2hin.