I saw this tweet this morning and was reminded of how cool raincloud plots are. With all the rain we have had in the last little bit, I thought it would be fun to try making a penguin raincloud plot.

That moment when you review a journal submission and you see dynamite plots: rage and joy at the same time! #dataviz #datavisualization #barbarplot #DoBetter pic.twitter.com/ncuxquEoAF
— Cédric Scherer (@CedScherer) March 26, 2021

load packages

library(tidyverse)
library(palmerpenguins)
library(ggeasy)
library(harrypotter)
library(gt)
library(cowplot)

source("theme_jen.R") # a script containing my custom ggplot theme

read the data

penguins <- penguins

get summary stats

penguin_summary <- penguins %>%
  group_by(species) %>%
  summarise(mean = mean(body_mass_g, na.rm = TRUE), 
            sd = sd(body_mass_g, na.rm = TRUE), 
            n = n(), 
            stderr = sd/sqrt(n))

gt(penguin_summary)

species	mean	sd	n	stderr
Adelie	3700.662	458.5661	152	37.19462
Chinstrap	3733.088	384.3351	68	46.60747
Gentoo	5076.016	504.1162	124	45.27097

make a psych standard plot

penguin_summary %>%
  ggplot(aes(x = species, y = mean, fill = species)) +
  geom_col() +
  scale_fill_hp_d(option = "HermioneGranger") +
  geom_errorbar(aes(ymin=mean-stderr, ymax=mean+stderr),
                  size=.5,    #  line thickness
                    width=.2)  + # bar width 
   labs(title = "Mean body mass by penguin species", 
        x = "Species", 
        y = "Mean body mass (g)") +
  theme_jen() +
  scale_y_continuous(expand = c(0, 0)) +
  easy_remove_legend()

It is really common in psychology to average the hell out of your data before you plot it. The downside of this practice is that you lose all the interesting information about variability. I think the power of R comes from how easy it is to plot raw data. When you plot scores of individual participants (or even individual participants scores on individual trials) you can see patterns in your data that would have been lost had you just plotted means with standard error bars.

AND it is super helpful for catching data entry errors and/or outliers. Here are a few options for plotting raw data…

`geom_point()`

geom_point() is the simplest way to see each individual data point, but it will plot individual points on top of each other, losing important information.

penguins %>%
  ggplot(aes(x = species, y = body_mass_g, colour = species)) +
  geom_point() +
  scale_colour_hp_d(option = "NewtScamander") +
   labs(title = "Mean body mass by penguin species", 
        x = "Species", 
        y = "Mean body mass (g)") +
  theme_jen() +
  scale_y_continuous(expand = c(0, 0), limits = c(2000,6500)) +
  easy_remove_legend()

`geom_jitter()`

The geom_jitter() function adds a tiny bit of noise, so that you don’t get overlapping points

penguins %>%
  ggplot(aes(x = species, y = body_mass_g, colour = species)) +
  geom_jitter(width = 0.3) + # width controls how much noise/spread 
  scale_colour_hp_d(option = "HermioneGranger") +
   labs(title = "Mean body mass by penguin species", 
        x = "Species", 
        y = "Mean body mass (g)") +
  theme_jen() +
  scale_y_continuous(expand = c(0, 0), limits = c(2000,6500)) +
  easy_remove_legend()

`geom_violin()`

You can add a geom_violin to your jittered points to see the distribution of the data.

penguins %>%
  ggplot(aes(x = species, y = body_mass_g, fill = species)) +
  geom_violin(alpha = 0.5) + # alpha controls transparency
  geom_jitter(width = 0.3, alpha = 0.5) +
  scale_fill_hp_d(option = "HermioneGranger") +
   labs(title = "Mean body mass by penguin species", 
        x = "Species", 
        y = "Mean body mass (g)") +
  theme_jen() +
  scale_y_continuous(expand = c(0, 0), limits = c(2000,6500)) +
  easy_remove_legend()

geom_boxplot

Boxplots are also great at visualising variability and outliers. You can layer jittered points on top of a box plot too.

penguins %>%
  ggplot(aes(x = species, y = body_mass_g, fill = species)) +
  geom_boxplot(alpha = 0.5) + # alpha controls transparency
  geom_jitter(width = 0.3, alpha = 0.5) +
  scale_fill_hp_d(option = "HermioneGranger") +
   labs(title = "Mean body mass by penguin species", 
        x = "Species", 
        y = "Mean body mass (g)") +
  theme_jen() +
  scale_y_continuous(expand = c(0, 0), limits = c(2000,6500)) +
  easy_remove_legend()

raincloud plot

Raincloud plots are a particularly nice way to represent violins, boxplots AND points without all the overlap. The last time I tried them I was following this post and it involved loading a custom script from someone’s github gist to get the geom_flat_violin() function. Sadly the link in that blogpost no longer works…

But apparently there is a new raincloud plots package!

Lets try it!

Option 1: `raincloudplots` package

Install the package it using the remotes package.

# remotes::install_github('jorvlan/raincloudplots')

library(raincloudplots)

The paper that was published with the original code/tutorial has a V2 update, which I am assuming is the best place to find info about the new package? Seems like the paper now contains tutorials about how to make raincloud via the modifiable scripts AND how to use the new R package (as well as how to make raincloud plots with Python and Matlab- comprehensive).

The descriptions in the paper are mostly about using rainclouds to visualise repeated-measures data, which is not really what I want for the penguins.

Maybe the vignette is better…

https://github.com/jorvlan/raincloudplots

copy code from the vignette

The first thing I always do when trying a new package is to copy the code from the vignette and make sure I can make it run. This example uses the iris data. Apparently the data_1x1() function makes the data into the format you need.

Here I am using sepal.length and width (rather than two different segments of sepal length data) to test whether it will work with different columns.

sepalLW <- data_1x1(
  array_1 = iris$Sepal.Length[1:50],
  array_2 = iris$Sepal.Width[1:50],
  jit_distance = .09,
  jit_seed = 321)

And then the raincloud_1x1() function makes the plot.

raincloud_sepalLW <- raincloud_1x1(
  data = sepalLW, 
  colors = (c('dodgerblue','darkorange')), 
  fills = (c('dodgerblue','darkorange')), 
  size = 1, 
  alpha = .6, 
  ort = 'h') +
scale_x_continuous(breaks=c(1,2), labels=c("width", "length"), limits=c(0, 3)) +
  xlab("sepal part") + 
  ylab("measurement") +
  theme_classic()

raincloud_sepalLW

adapt for penguins

The arrays seem to need indexed values (they probably don’t but that it how it is presented in the vignette, so lets try that to begin with). First, sort the penguins by species so can refer to them using indexing. Then use the data_1x1 function to set up the data…

sorted_penguins <- penguins %>%
  arrange(species)


species_bodymass <- data_1x1(
  array_1  = sorted_penguins$body_mass_g[153:220], # chinstrap
  array_2 = sorted_penguins$body_mass_g[221:344], # gentoo 
  jit_distance = .09,
  jit_seed = 321)

head(species_bodymass)

##   y_axis x_axis id       jit
## 1   3500      1  1 1.0820609
## 2   3900      1  2 1.0787114
## 3   3650      1  3 0.9528797
## 4   3525      1  4 0.9559133
## 5   3725      1  5 0.9802922
## 6   3950      1  6 0.9714124

Make the plot …

raincloud_penguins_H <- raincloud_1x1(
  data = species_bodymass, 
  colors = (c('dodgerblue','darkorange')), 
  fills = (c('dodgerblue','darkorange')), 
  size = 1, 
  alpha = .6, 
  ort = 'h') +
scale_x_continuous(breaks=c(1,2), labels=c("chinstrap", "gentoo"), limits=c(0, 3)) +
  xlab("Species") + 
  ylab("Body Mass (g)") +
  theme_classic()

raincloud_penguins_H

In the example, ort = was set to ‘h’, but you can turn the plot vertical with ort = ‘v’

raincloud_penguins_V <- raincloud_1x1(
  data = species_bodymass, 
  colors = (c('dodgerblue','darkorange')), 
  fills = (c('dodgerblue','darkorange')), 
  size = 1, 
  alpha = .6, 
  ort = 'v') +
scale_x_continuous(breaks=c(1,2), labels=c("chinstrap", "gentoo"), limits=c(0, 3)) +
  xlab("Species") + 
  ylab("Body Mass (g)") +
  theme_classic()

raincloud_penguins_V

repeated measures

The package will connect points for repeated measures designs. This is less relevant in the penguin data, but I guess we could try it out by plotting bill length and depth for each penguin?

Copy the code from the vignette and sub in penguin variables.

species_bills <- data_2x2(
  array_1 = sorted_penguins$bill_length_mm[153:220],
  array_2 = sorted_penguins$bill_depth_mm[153:220],
  array_3 = sorted_penguins$bill_length_mm[221:344],
  array_4 = sorted_penguins$bill_depth_mm[221:344],
  labels = (c('chinstrap','gentoo')),
  jit_distance = .09,
  jit_seed = 321,
  spread_x_ticks = TRUE)

Plot 2x2_repmes with connected points.

speciesbills <- raincloud_2x2_repmes(
  data = species_bills,
  colors = (c('dodgerblue', 'darkorange', 'dodgerblue', 'darkorange')),
  fills = (c('dodgerblue', 'darkorange', 'dodgerblue', 'darkorange')),
  size = 1,
  alpha = .6,
  spread_x_ticks = FALSE) +
scale_x_continuous(breaks=c(1,2), labels=c("length", "width"), limits=c(0, 3)) +
  xlab("Bill part") + 
  ylab("Measurement") +
  theme_classic()

speciesbills

agghh… it seems to have lost the species info…bummer. I don’t have the energy to work out what is going on at the moment.

I think what I don’t like about this package is that there is some magic happening in the formatting of the data which isn’t very transparent. When I look at the species_bills dataframe, info about species is still there (as a group variable) but I don’t know how to get it into the plot.

head(species_bills)

##   y_axis x_axis id     group       jit
## 1   46.5      1  1 chinstrap 1.0820609
## 2   50.0      1  2 chinstrap 1.0787114
## 3   51.3      1  3 chinstrap 0.9528797
## 4   45.4      1  4 chinstrap 0.9559133
## 5   52.7      1  5 chinstrap 0.9802922
## 6   45.2      1  6 chinstrap 0.9714124

I seem to remember that the old script version constructed the plot with layers within ggplot, which I understood how to customise much better. Lets try that instead.

Option 2: the old script method

The link to the github script in the blog post doesn’t work anymore. Maybe I can download the script and just source my own copy of it?

I copied the code from here into an R script and saved it in my project folder. Then I can run the script using source().

source("R_rainclouds.R")

The R_rainclouds.R script gives you access to geom_flat_violin(). Make a plot that layers geom_flat_violin + geom_point + geom_boxplot.

penguins %>% ggplot(aes(x=species,y=body_mass_g, fill = species)) +
  geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust =2, alpha = 0.5) +
  geom_point(position = position_jitter(width = .15), size = .8) +
   geom_boxplot(aes(x = species, y = body_mass_g, fill = species),outlier.shape = NA, alpha = .5, width = .1, colour = "black")+
  theme_jen() +
  labs(title = "Raincloud plot of body mass by species", x = 'Species', y = 'Body mass (g)') +
  easy_remove_legend()

BEAUTIFUL!

successes/shallenges

I was expecting the raincloudplot package to make constructing raincloud plots EASIER… not more difficult. I was also expecting to like the aesthesics of the package version better. It is probably just that I am more familiar with the ggplot idea of adding layers to a plot and customising the look. I might get used to the code style of the raincloudplot package if i used it a lot, but I like that with ggplot there is nothing hidden under the hood. You add each layer, run the code and see the plot build. Makes troubleshooting much easier. The package seems to do a lot of set up at the level of the data, which isn’t so user friendly.

if you want to try raincloud plots, just copy the R_raincloud.R script from here and save it in your project. Run it using source().

footnotes

an aside about the r setup chunk

NOTE I played with the r setup chunk in this post and worked out how to make the warnings and messages not show for all chunks! woot!

The default RMarkdown set up chunk looks like this …

The default set up chunk is set to not display in your knitted document (include = FALSE) and the only setting that is specified is echo = TRUE- this makes both the code and the output display in your knitted document.

I found this post by Karl Broman that covers Global chunk options useful and now my new one for this post looks like this.

Now my setup chunk is set to control the width and height of the figures making them all consistently 8 x 6, the fig.path argument will auto-magically save each of the figures as a png in a folder called Figs each time you knit the document.

You can also get those png files named useful things by changing the label on on the chunk that generates the plot. After the {r} just put a short label and it will call the plot that when it saves.

echo is still TRUE, which means both my code and output will display but now warning and messages are FALSE, so R is no longer going to display the warnings about packages or missing values from my plots in my knitted document. Sweet!

custom ggplot theme

You will notice that I’ve added theme_jen() to my plots. You can write your own custom theme by writing a function like mine below. Mine is pretty similar theme_classic() with bigger font and more white space. I keep it in a script file that I can source at the top of my file.

theme_jen <- function () {
  
  # define font up front
  font <- "Helvetica"  
  # this theme uses theme_bw as the base 
  
  theme_bw() %+replace%   
    theme(
      #get rid of grid lines/borders
      panel.border = element_blank(), 
      panel.grid.major = element_blank(), 
      panel.grid.minor = element_blank(), 
      # add white space top, right, bottom, left
      plot.margin = unit(c(1, 1, 1, 1), "cm"), 
      # custom axis title/text/lines
      axis.title = element_text(            
        family = font,                     
        size = 14),               
      axis.text = element_text(              
        family = font,                       
        size = 12),   
      # margin pulls text away from axis
      axis.text.x = element_text(           
        margin=margin(5, b = 10)),
      # black lines
      axis.line = element_line(colour = "black", size = rel(1)), 
      # custom plot titles, subtitles, captions
      plot.title = element_text(             
        family = font,              
        size = 18,
        hjust = -0.1,
        vjust = 4),
       # custom plot subtitles
      plot.subtitle = element_text(          
        family = font,                   
        size = 14, 
        hjust = 0,
        vjust = 3),
       # custom captions
      plot.caption = element_text(           
        family = font,                   
        size = 10,
        hjust = 1,
        vjust = 2), 
      # custom legend 
      legend.title = element_text(          
        family = font,           
        size = 10,                
        hjust = 0), 
      legend.text = element_text(          
        family = font,               
        size = 8,                     
        hjust = 0), 
      #no background on legend
      legend.key = element_blank(),   
      # white background on plot
      strip.background = element_rect(fill = "white",  
                                      colour = "black", 
                                      size = rel(2)), complete = TRUE)
  
}

W6LLrain

Jen Richmond

27/03/2021

load packages

read the data

get summary stats

make a psych standard plot

`geom_point()`

`geom_jitter()`

`geom_violin()`

geom_boxplot

raincloud plot

Option 1: `raincloudplots` package

copy code from the vignette

adapt for penguins

repeated measures

Option 2: the old script method

successes/shallenges

footnotes

an aside about the r setup chunk

custom ggplot theme

W6LLrain

Jen Richmond

27/03/2021

load packages

read the data

get summary stats

make a psych standard plot

geom_point()

geom_jitter()

geom_violin()

geom_boxplot

raincloud plot

Option 1: raincloudplots package

copy code from the vignette

adapt for penguins

repeated measures

Option 2: the old script method

successes/shallenges

footnotes

an aside about the r setup chunk

custom ggplot theme

`geom_point()`

`geom_jitter()`

`geom_violin()`

Option 1: `raincloudplots` package