Raincloud Plots

What they are for

A raincloud plot combines:

a half violin plot or density plot
a boxplot
individual data points

This makes it useful for showing:

the distribution of values
the summary statistics
the raw observations

Use a raincloud plot when you want to compare the distribution of a numeric variable across groups.

# Install if needed:
# install.packages(c("ggplot2", "ggdist", "palmerpenguins", "dplyr"))

library(ggplot2)
library(ggdist)
library(palmerpenguins)

## 
## Attaching package: 'palmerpenguins'

## The following objects are masked from 'package:datasets':
## 
##     penguins, penguins_raw

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

penguins_clean <- penguins %>%
  select(species, body_mass_g) %>%
  na.omit()

ggplot(penguins_clean, aes(x = species, y = body_mass_g, fill = species)) +
  stat_halfeye(
    adjust = 0.5,
    width = 0.6,
    justification = -0.2,
    .width = 0,
    point_colour = NA
  ) +
  geom_boxplot(
    width = 0.12,
    outlier.shape = NA,
    alpha = 0.5
  ) +
  geom_jitter(
    width = 0.08,
    alpha = 0.5,
    size = 1.5
  ) +
  labs(
    title = "Raincloud Plot of Penguin Body Mass",
    x = "Species",
    y = "Body Mass (g)"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

If you have a second category you want to show….

library(ggplot2)
library(ggdist)
penguins_clean <- penguins %>%
  select(species, sex, body_mass_g) %>%
  na.omit()

ggplot(penguins_clean, aes(x = species, y = body_mass_g, fill = sex)) +
  
  # half-eye distribution
  stat_halfeye(
    position = position_dodge(width = 0.75),
    adjust = 0.6,
    width = 0.55,
    .width = 0,
    justification = -0.2,
    point_colour = NA,
    alpha = 0.5
  ) +
  
  # boxplot summary
  geom_boxplot(
    aes(color = sex),
    width = 0.12,
    position = position_dodge(width = 0.75),
    outlier.shape = NA,
    alpha = 0.65,
    linewidth = 0.5
  ) +
  
  # raw data points
  geom_jitter(
    aes(color = sex),
    position = position_jitterdodge(
      jitter.width = 0.08,
      dodge.width = 0.75
    ),
    size = 1.8,
    alpha = 0.25
  ) +
  
  labs(
    title = "Penguin Body Mass by Species and Sex",
    subtitle = "Raincloud plot showing distribution, summary statistics, and individual observations",
    x = "Species",
    y = "Body Mass (g)",
    fill = "Sex",
    color = "Sex"
  ) +
  
  scale_fill_manual(values = c("female" = "mistyrose3", "male" = "darkseagreen3")) +
  scale_color_manual(values = c("female" = "indianred3", "male" = "seagreen4")) +
  
  theme_classic(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5),
    axis.title = element_text(face = "bold"),
    legend.position = "right"
  )

library(ggplot2)
library(ggdist)
library(palmerpenguins)
library(dplyr)

penguins_clean <- penguins %>%
  filter(!is.na(species), !is.na(body_mass_g)) %>%
  mutate(species = factor(species, levels = c("Adelie", "Chinstrap", "Gentoo")))

ggplot(penguins_clean, aes(x = species, y = body_mass_g)) +
  
  # half violin (raincloud shape)
  stat_halfeye(
    adjust = 0.6,
    width = 0.6,
    .width = 0,
    justification = -0.3,
    point_colour = NA,
    fill = "#74a9cf",
    alpha = 0.7
  ) +
  
  # boxplot
  geom_boxplot(
    width = 0.12,
    outlier.shape = NA,
    fill = "white",
    color = "black",
    linewidth = 0.8
  ) +
  
  # points (aligned dots instead of jitter chaos)
  geom_dotplot(
    binaxis = "y",
    stackdir = "down",
    dotsize = 0.6,
    fill = "gray40",
    alpha = 0.7
  ) +
  
  labs(
    title = "Penguin Body Mass by Species",
    x = "Species",
    y = "Body Mass (g)"
  ) +
  
  theme_classic(base_size = 16) +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    axis.title = element_text(face = "bold")
  )

## Bin width defaults to 1/30 of the range of the data. Pick better value with
## `binwidth`.

Sina Plots

A sina plot is similar to a jitter plot, but the points are spread based on the density of the data. That means:

where there are more points, the plot becomes wider
where there are fewer points, it stays narrow

This makes it a nice alternative to:

strip plots
jitter plots
violin plots

It shows both individual observations and distribution shape.

Use a sina plot when you want to show raw data points without them overlapping too much, while also giving a sense of density.

Points are not randomly scattered Wider sections indicate a greater concentration of values

# Install if needed:
# install.packages(c("ggplot2", "ggforce", "palmerpenguins", "dplyr"))

library(ggplot2)
library(ggforce)
library(palmerpenguins)
library(dplyr)

penguins_clean <- penguins %>%
  select(species, flipper_length_mm) %>%
  na.omit()

ggplot(penguins_clean, aes(x = species, y = flipper_length_mm, color = species)) +
  geom_sina(alpha = 0.7, size = 3) +
  labs(
    title = "Sina Plot of Penguin Flipper Length",
    x = "Species",
    y = "Flipper Length (mm)"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

Cleveland Plots

A Cleveland dot plot is used to compare values across categories using dots instead of bars.

It is helpful because:

it avoids the visual bulk of barplots
it makes it easier to compare positions along a shared axis
it works especially well when there are many groups

Use a Cleveland dot plot when comparing one summary value across categories.

Examples:

average expression by gene
average body mass by species
percentage of students by major

This example compares the average body mass of penguin species.

# Install if needed:
# install.packages(c("ggplot2", "palmerpenguins", "dplyr"))

library(ggplot2)
library(palmerpenguins)
library(dplyr)

species_summary <- penguins %>%
  group_by(species) %>%
  summarise(mean_body_mass = mean(body_mass_g, na.rm = TRUE)) %>%
  arrange(mean_body_mass)

ggplot(species_summary, aes(x = mean_body_mass, y = reorder(species, mean_body_mass))) +
  geom_point(size = 4) +
  labs(
    title = "Cleveland Dot Plot of Mean Penguin Body Mass",
    x = "Mean Body Mass (g)",
    y = "Species"
  ) +
  theme_minimal()

If you have groups…

grouped_summary <- penguins %>%
  group_by(species, sex) %>%
  summarise(mean_bill_length = mean(bill_length_mm, na.rm = TRUE), .groups = "drop")

ggplot(grouped_summary, aes(x = mean_bill_length, y = species, color = sex)) +
  geom_point(size = 3, position = position_dodge(width = 0.4)) +
  labs(
    title = "Grouped Cleveland Dot Plot of Mean Bill Length",
    x = "Mean Bill Length (mm)",
    y = "Species",
    color = "Sex"
  ) +
  theme_minimal()

Forest Plots

A forest plot shows:

a point estimate (like a mean or effect size)
a confidence interval (uncertainty around that estimate)

Use a forest plot when:

you are comparing estimates across groups
you want to include uncertainty

library(ggplot2)
library(dplyr)
library(palmerpenguins)

summary_data <- penguins %>%
  group_by(species) %>%
  summarise(
    mean_mass = mean(body_mass_g, na.rm = TRUE),
    sd = sd(body_mass_g, na.rm = TRUE),
    n = n(),
    se = sd / sqrt(n),
    lower = mean_mass - 1.96 * se,
    upper = mean_mass + 1.96 * se
  )


ggplot(summary_data, aes(x = mean_mass, y = reorder(species, mean_mass))) +
  geom_point(size = 4) +
  geom_errorbarh(aes(xmin = lower, xmax = upper), height = 0.2) +
  labs(
    title = "Forest Plot of Mean Penguin Body Mass",
    x = "Mean Body Mass (g) with 95% CI",
    y = "Species"
  ) +
  theme_minimal()

## Warning: `geom_errorbarh()` was deprecated in ggplot2 4.0.0.
## ℹ Please use the `orientation` argument of `geom_errorbar()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `height` was translated to `width`.

If you have grouped data…

summary_grouped <- penguins %>%
  group_by(species, sex) %>%
  summarise(
    mean_mass = mean(body_mass_g, na.rm = TRUE),
    sd = sd(body_mass_g, na.rm = TRUE),
    n = n(),
    se = sd / sqrt(n),
    lower = mean_mass - 1.96 * se,
    upper = mean_mass + 1.96 * se,
    .groups = "drop"
  )

ggplot(summary_grouped, aes(x = mean_mass, y = species, color = sex)) +
  geom_point(position = position_dodge(width = 0.5), size = 3) +
  geom_errorbarh(
    aes(xmin = lower, xmax = upper),
    position = position_dodge(width = 0.5),
    height = 0.2
  ) +
  labs(
    title = "Forest Plot of Body Mass by Species and Sex",
    x = "Mean Body Mass (g) with 95% CI",
    y = "Species",
    color = "Sex"
  ) +
  theme_minimal()

## `height` was translated to `width`.

Homework

You will be using this dataset for the homework

library(ggplot2)
library(dplyr)
library(palmerpenguins)

penguins_clean <- penguins %>%
  filter(!is.na(species), !is.na(sex), !is.na(body_mass_g))

Part 1. Create one plot of your choice to visualize the relationship between:

Species
Mass
Sex

#installing packages needed 
library(ggplot2)
library(dplyr)
library(ggdist)

#data set used 
penguins_clean <- penguins %>%
  filter(!is.na(species), !is.na(sex), !is.na(body_mass_g))

#code used 
ggplot(penguins_clean, aes(x = species, y = body_mass_g, fill = sex)) +
  
  # half-eye distribution
  stat_halfeye(
    position = position_dodge(width = 0.8),
    adjust = 0.6,
    width = 0.8,
    .width = 0,
    justification = -0.3,
    point_colour = NA,
    alpha = 0.8
  ) +
  
  # boxplot summary
  geom_boxplot(
    aes(color = sex),
    width = 0.10,
    position = position_dodge(width = 0.9),
    outlier.shape = NA,
    alpha = 0.75,
    linewidth = 0.5
  ) +
  
  # raw data points
  geom_jitter(
    aes(color = sex),
    position = position_jitterdodge(
      jitter.width = 0.08,
      dodge.width = 0.75
    ),
    size = 1.8,
    alpha = 0.25
  ) +
  
  labs(
    title = "Penguin Body Mass by Species and Sex",
    subtitle = "Raincloud plot showing distribution, summary statistics, and individual observations",
    x = "Species",
    y = "Body Mass (g)",
    fill = "Sex",
    color = "Sex"
  ) +
  
scale_fill_manual(values = c("female" = "#E76F51", "male" = "#2A9D8F")) +
scale_color_manual(values = c("female" = "#9C3D2E", "male" = "#1F776E")) +
  
  theme_classic(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5),
    axis.title = element_text(face = "bold"),
    legend.position = "right"
  )

What plot type did you choose? I choose to create a raincloud plot.
Why is this plot appropriate for this data?

This plot is appropriate because a rainbow plot shows the distribution of a continuous variable acorss categorical groups. It also conmbines elements of a boxplot and violin plot.

What patterns do you observe? The plot shows that male penguins have a higher body mass than female penguins. It also shows that the Gentoo penguins have the highest body mass compared to other species.

Part 2. Create a raincloud plot showing body mass across species.

#installing packages needed 
library(ggplot2)
library(dplyr)
library(ggdist)

#data set used 
penguins_clean <- penguins %>%
  filter(!is.na(species), !is.na(sex), !is.na(body_mass_g))

#code used 
ggplot(penguins_clean, aes(x = species, y = body_mass_g, fill = sex)) +
  
  # half-eye distribution
  stat_halfeye(
    position = position_dodge(width = 0.8),
    adjust = 0.6,
    width = 0.8,
    .width = 0,
    justification = -0.3,
    point_colour = NA,
    alpha = 0.8
  ) +
  
  # boxplot summary
  geom_boxplot(
    aes(color = sex),
    width = 0.10,
    position = position_dodge(width = 0.9),
    outlier.shape = NA,
    alpha = 0.75,
    linewidth = 0.5
  ) +
  
  # raw data points
  geom_jitter(
    aes(color = sex),
    position = position_jitterdodge(
      jitter.width = 0.08,
      dodge.width = 0.75
    ),
    size = 1.8,
    alpha = 0.25
  ) +
   labs(title = "Raincloud Plot of Body Mass by Species",
       x = "Species",
       y = "Body Mass (g)") +
  theme_minimal()

Which species has the highest body mass? The Gentoo species have the highest body mass.
Which species shows the greatest variability?

The Gentoo group shows the greatest variability and it covers the largest overall range.

What does this plot show that a boxplot alone would not? A raincloud plot makes the shape of the distribution and the actual data points visible, while the boxplot hides both of those. The raincloud gives a richer picture of how body mass is distributed within a species.

Part 3. Create a forest plot. Now summarize the data and visualize uncertainty.

# Don't forget, you will need to make summary data. 
penguins_clean <- penguins %>%
  filter(!is.na(species), !is.na(sex), !is.na(body_mass_g))


# Create summary data
summary_data <- penguins_clean %>%
  group_by(species, sex) %>%
  summarise(
    mean_mass = mean(body_mass_g),
    sd = sd(body_mass_g),
    n = n(),
    se = sd / sqrt(n),
    ci_lower = mean_mass - 1.96 * se,
    ci_upper = mean_mass + 1.96 * se,
    .groups = "drop"
  )

# Create forest plot
ggplot(summary_data, aes(x = mean_mass, y = interaction(species, sex), color = sex)) +
  geom_point(size = 4) +
  geom_errorbarh(aes(xmin = ci_lower, xmax = ci_upper), height = 0.3) +
  labs(
    title = "Forest Plot of Penguin Body Mass",
    x = "Mean Body Mass (g)",
    y = "Species and Sex"
  ) +
  theme_minimal()

## `height` was translated to `width`.

Which group has the highest mean body mass? The Gentoo mlaes have the highest mean body mass according to my graph.
Which group has the widest confidence interval? Why? The female Chinstrap and female Gentoo penguins have the widest confidence interval because they are showing a wider confidence intervals because of there few observations in some of the sex- species combinations. This increases the uncertainty in the mean estimate.
Do any groups appear clearly different (based on overlap)? Yes, some of the groups appear different based on interval overlap. The Gentoo penguins have a higher mean body mass resulting in thier condidence interval;s showing little overlap with the Adelie penguins. This shows a clear body mass differnce in those species.

Raincloud, Sina, Cleveland and Correlation

2026-04-16