Mastering Advanced Data Visualization: The Lollipop Chart

Step-by-Step Evolution from Basics to High-End Art

Author

Abdullah Al Shamim

Published

May 31, 2026

Notebook Introduction (Metadata)

Aim: Create a professional visualization of mammalian sleep habits using the msleep dataset.
Key Skills: Layering (Geometry), Theme Customization (Dark Mode), Annotations, and Color Gradients.
Concept: In this tutorial, we will learn how to transform a simple plot into “art” by adding code layer by layer.

Dataset Introduction

This dataset contains information on the sleep patterns, body weights, and brain weights of 83 mammalian species. It is an eco-physiological dataset.

Variable Dictionary

Variable	Description
`name`	Common name of the animal.
`genus`	Taxonomic rank (Genus).
`vore`	Dietary category (carni, herbi, omni, insecti).
`order`	Taxonomic order.
`conservation`	Conservation status (e.g., lc - least concern, en - endangered).
`sleep_total`	Total sleep in 24 hours (hours).
`sleep_rem`	REM (Rapid Eye Movement) sleep (hours).
`sleep_cycle`	Length of sleep cycle (hours).
`awake`	Total time awake in 24 hours (hours).
`brainwt`	Brain weight (kilograms).
`bodywt`	Total body weight (kilograms).

Why is this dataset popular?

Relationship Analysis: Does sleep decrease as body weight increases? It is excellent for learning Correlation.
Categorical Data: You can create great bar charts using dietary habits (vore) and conservation status.
Missing Value Handling: This dataset contains several NA values, making it perfect for practicing Data Cleaning.

Show the code

# Loading the dataset
library(ggplot2)
data(msleep)

# Checking dataset structure
str(msleep)

# Summary Statistics
summary(msleep[, c("sleep_total", "awake", "bodywt", "brainwt")])

1: Load ggplot2 — provides the visualization functions and grants access to the built-in msleep dataset
2: Load msleep into the active environment from ggplot2 — makes the dataset available by name for all subsequent code
3: Inspect the structure — shows column names, R data types, and a preview of values for all 11 variables
4: Summarise four key numeric columns — returns min, Q1, median, mean, Q3, and max, plus NA counts for each

tibble [83 × 11] (S3: tbl_df/tbl/data.frame)
 $ name        : chr [1:83] "Cheetah" "Owl monkey" "Mountain beaver" "Greater short-tailed shrew" ...
 $ genus       : chr [1:83] "Acinonyx" "Aotus" "Aplodontia" "Blarina" ...
 $ vore        : chr [1:83] "carni" "omni" "herbi" "omni" ...
 $ order       : chr [1:83] "Carnivora" "Primates" "Rodentia" "Soricomorpha" ...
 $ conservation: chr [1:83] "lc" NA "nt" "lc" ...
 $ sleep_total : num [1:83] 12.1 17 14.4 14.9 4 14.4 8.7 7 10.1 3 ...
 $ sleep_rem   : num [1:83] NA 1.8 2.4 2.3 0.7 2.2 1.4 NA 2.9 NA ...
 $ sleep_cycle : num [1:83] NA NA NA 0.133 0.667 ...
 $ awake       : num [1:83] 11.9 7 9.6 9.1 20 9.6 15.3 17 13.9 21 ...
 $ brainwt     : num [1:83] NA 0.0155 NA 0.00029 0.423 NA NA NA 0.07 0.0982 ...
 $ bodywt      : num [1:83] 50 0.48 1.35 0.019 600 ...
  sleep_total        awake           bodywt            brainwt       
 Min.   : 1.90   Min.   : 4.10   Min.   :   0.005   Min.   :0.00014  
 1st Qu.: 7.85   1st Qu.:10.25   1st Qu.:   0.174   1st Qu.:0.00290  
 Median :10.10   Median :13.90   Median :   1.670   Median :0.01240  
 Mean   :10.43   Mean   :13.57   Mean   : 166.136   Mean   :0.28158  
 3rd Qu.:13.75   3rd Qu.:16.15   3rd Qu.:  41.750   3rd Qu.:0.12550  
 Max.   :19.90   Max.   :22.10   Max.   :6654.000   Max.   :5.71200  
                                                    NA's   :27

1. Environment Setup & Data Processing

First, let’s load the necessary libraries and organize our data. We will calculate the average sleep (mean_sleep) grouped by order.

Show the code

library(tidyverse)

1: Load the tidyverse — brings in dplyr for data wrangling, forcats for factor reordering, and ggplot2 for the full plot pipeline

1.1 Grouping & Summarizing

We use group_by(order) to categorize the animals and summarise() to calculate the average sleep for each group.

Show the code

# Data Preparation
sleep_data <- msleep %>%
  group_by(order) %>%
  summarise(mean_sleep = mean(sleep_total))

1: Pipe msleep into a chain of transformations and store the summarised result as sleep_data
2: Partition all 83 rows by taxonomic order — each unique order becomes a separate group for downstream calculations
3: Collapse each group to a single row containing mean_sleep, the arithmetic mean of sleep_total for that order

1.2 Factor Reordering

To make the visualization aesthetically pleasing, we use fct_reorder(). This sorts the categories from lowest to highest based on the mean sleep values.

Show the code

sleep_data <- sleep_data %>%
  mutate(order = fct_reorder(order, mean_sleep))

1: Overwrite sleep_data in place — appends the reordering step without creating a new object
2: fct_reorder() converts order to a factor whose levels are sorted by mean_sleep ascending — ggplot2 reads factor levels for axis ordering, so this produces a naturally sorted lollipop chart

2. Step-by-Step Plot Evolution

Step 1: The Basic Canvas

We start by defining the X and Y axes and adding labels.

Show the code

plot_step1 <- sleep_data %>% 
  ggplot(aes(order, mean_sleep)) +
  labs(title = "Average Sleep Time of Mammals by Order",
       x = "",
       y = "Hours")

plot_step1

1: Initialize the canvas — order to the x-axis and mean_sleep to the y-axis; no geometry is added yet so only the coordinate system and axes render
2: Attach labels — the empty string for x suppresses the redundant axis title while y = "Hours" clarifies the measurement unit

Step 2: Styling (Custom Theme & Dark Mode)

Now we will darken the background and change text colors to create a “Neon Vibe.”

2.1 Axis Labels Style

We will tilt the X-axis labels by 45° for better readability and apply specific hex codes for color.

Show the code

plot_step2 <- plot_step1 +
  theme(
    axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, color = "#d580ff"),
    axis.text.y = element_text(color = "#eabfff")
  )

plot_step2

1: Rotate x-axis labels 45° and right-align with vjust/hjust to prevent overlap across the many order names; apply neon purple #d580ff
2: Color y-axis tick labels with lighter lavender #eabfff — a two-tone neon palette that creates visual depth

2.2 Title & Axis Title Style

We make the main title and axis titles bold and vibrant.

Show the code

plot_step2 <- plot_step2 +
  theme(
    axis.title.y = element_text(color = "#d580ff"),
    plot.title = element_text(color = "#d580ff", face = "bold", size = 20)
  )

plot_step2

1: Color the y-axis title to match the neon purple scheme — unifies all text elements under the same palette
2: Set the plot title to bold, neon purple, at size = 20 — makes it the dominant visual anchor of the dark-theme chart

2.3 Axis Lines & Ticks Style

We apply the neon color to the axis lines and tick marks.

Show the code

plot_step2 <- plot_step2 +
  theme(
    axis.line = element_line(color = "#d580ff"),
    axis.ticks = element_line(color = "#d580ff")
  )

plot_step2

1: Apply neon purple to the structural axis border lines — ties together all non-data chart elements with a single accent color
2: Match the tick mark color to the axis lines — eliminates any visual inconsistency for a cohesive neon look

2.4 Background Colors

To give the plot a “Dark Theme,” we set both panel.background and plot.background to black.

Show the code

plot_step2 <- plot_step2 +
  theme(
    panel.background = element_rect(fill = "black"),
    plot.background = element_rect(fill = "black")
  )

plot_step2

1: Set the inner plotting panel to solid black — maximizes contrast so the neon purple and lavender elements pop visually
2: Extend the black fill to the outer plot region including margins and the title area — completes the edge-to-edge dark theme

2.5 Cleaning Grids & Legend

To achieve a modern look, we remove all grid lines and the legend.

Show the code

plot_step2 <- plot_step2 +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    legend.position = "none"
  )

plot_step2

1: Remove all major grid lines — a clean black panel is more visually impactful than gridded clutter on a dark theme
2: Remove all minor grid lines — keeps the panel completely bare so only the lollipop geometry draws the eye
3: Hide the legend — the sorted x-axis labels already identify each mammal order, making a legend redundant

Step 3: The Lollipop Base (Reference Line & Sticks)

We add a benchmark line for the overall average sleep and connecting “sticks” for each point.

Show the code

plot_step3 <- plot_step2 +
  # Horizontal line for overall average sleep
  geom_hline(yintercept = mean(msleep$sleep_total),
             color = "#d580ff", linewidth = 1) +
  # Segments to show distance from the mean line
  geom_segment(aes(x = order,
                   y = mean(msleep$sleep_total), 
                   xend = order, 
                   yend = mean_sleep), 
               color = "#d580ff")

plot_step3

1: Draw a horizontal reference line at the grand mean of all 83 mammals’ sleep — serves as the shared baseline that each lollipop stick grows from or shrinks toward
2: Draw vertical segments from the grand mean up (or down) to each order’s mean_sleep — these are the “sticks” of the lollipop, visually encoding how far each order deviates from the average

Step 4: Adding Color Gradient Points

We place points at the end of the sticks that change color based on the amount of sleep.

Show the code

plot_step4 <- plot_step3 +
  geom_point(aes(color = mean_sleep), size = 5) +
  scale_color_gradient(low = "purple", high = "#eabfff")

plot_step4

1: Place filled circles at the tip of every stick with size = 5; mapping color to mean_sleep enables a gradient that encodes sleep duration as a second visual channel
2: Define the gradient range — orders with the least sleep get deep purple while the highest-sleep orders receive bright lavender #eabfff, creating a natural glow effect

Step 5: The Final Touch (Annotations & Arrows)

Finally, we add an explanatory text and a curved arrow for better storytelling.

Show the code

final_plot <- plot_step4 +
  # Adding explanatory text
  annotate("text", x = 4, y = max(msleep$sleep_total) - 4,
           label = "Average sleep\nfor all mammals", 
           color = "#eabfff", size = 4, fontface = "bold", hjust = 0) +
  # Adding a curved arrow pointing to the mean line
  geom_curve(aes(x = 3.7, y = max(msleep$sleep_total) - 5,
                   xend = 1.5, yend = mean(msleep$sleep_total)), 
               color = "#eabfff", curvature = 0.5, 
               arrow = arrow(length = unit(0.07, "npc"), type = "open"))

final_plot

1: Place a two-line bold label at x = 4 near the top of the chart; hjust = 0 left-aligns the text from that anchor so it reads naturally into open whitespace
2: Draw a curved arrow from just below the text annotation down to the mean reference line — guides the reader’s eye from the label to the exact benchmark it describes

Key Visual Toolkit (Cheat Sheet)

fct_reorder(): Used to sort categories based on numeric values.
theme(panel.background = ...): Customizes the internal plot background.
geom_segment(): Creates the “sticks” for the lollipop chart.
**geom_curve() & annotate()**: Highlights or explains specific parts of the graph.
scale_color_gradient(): Shows color shifts for continuous data.

Congratulations! You have successfully transformed a simple plot into a high-end visualization.