Stellar Classification: A Hertzsprung-Russell Diagram

Author

Zyam Jadoon Khawaja

Published

May 15, 2026

Setup

Code
# Load required libraries
library(tidyverse)
library(plotly)
library(dslabs)

The Dataset

Code
# Load the stars dataset from dslabs
data(stars)

# Preview the data
glimpse(stars)
Rows: 96
Columns: 4
$ star      <fct> Sun, SiriusA, Canopus, Arcturus, AlphaCentauriA, Vega, Capel…
$ magnitude <dbl> 4.8, 1.4, -3.1, -0.4, 4.3, 0.5, -0.6, -7.2, 2.6, -5.7, -2.4,…
$ temp      <int> 5840, 9620, 7400, 4590, 5840, 9900, 5150, 12140, 6580, 3200,…
$ type      <chr> "G", "A", "F", "K", "G", "A", "G", "B", "F", "M", "B", "B", …

The stars dataset from the dslabs package contains astronomical data on 96 stars, including each star’s surface temperature (in Kelvin), magnitude (relative to the Sun), magnitude (brightness), and type (spectral classification). This dataset is ideal for recreating a Hertzsprung-Russell (H-R) diagram — one of the most important visualizations in astronomy — which plots stellar temperature against magnitude to reveal how stars are classified into distinct groups: main sequence stars, giants, supergiants, and white dwarfs.


Data Preparation

Code
# Inspect unique star types to understand the categorical variable
# This tells us how many groups we'll need to color
unique(stars$type)
 [1] "G"  "A"  "F"  "K"  "B"  "M"  "O"  "DA" "DF" "DB"
Code
# Count stars per type
stars %>%
  count(type, sort = TRUE)
   type  n
1     M 32
2     B 19
3     K 16
4     A 13
5     F  7
6     G  4
7    DA  2
8    DB  1
9    DF  1
10    O  1
Code
# Create a cleaned version of the dataset with:
# 1. A log10 magnitude column (magnitude spans many orders of magnitude)
# 2. A readable star type label
# 3. A size variable based on magnitude (brighter = larger point)

stars_clean <- stars %>%
  mutate(
    log_magnitude = log10(abs(magnitude) + 1),
    # Reverse magnitude: lower magnitude = brighter star
    # We invert so larger bubbles = brighter stars
    point_size = max(temp, na.rm = TRUE) - temp + 1000,
    # Group rare types into "Other" for cleaner legend
    type_grouped = case_when(
      type == "G"  ~ "G (Sun-like)",
      type == "K"  ~ "K (Orange)",
      type == "M"  ~ "M (Red Dwarf)",
      type == "B"  ~ "B (Blue-White)",
      type == "A"  ~ "A (White)",
      type == "F"  ~ "F (Yellow-White)",
      type == "DA" ~ "DA (White Dwarf)",
      type == "DB" ~ "DB (White Dwarf)",
      TRUE         ~ "Other"
    )
  )

glimpse(stars_clean)
Rows: 96
Columns: 7
$ star          <fct> Sun, SiriusA, Canopus, Arcturus, AlphaCentauriA, Vega, C…
$ magnitude     <dbl> 4.8, 1.4, -3.1, -0.4, 4.3, 0.5, -0.6, -7.2, 2.6, -5.7, -…
$ temp          <int> 5840, 9620, 7400, 4590, 5840, 9900, 5150, 12140, 6580, 3…
$ type          <chr> "G", "A", "F", "K", "G", "A", "G", "B", "F", "M", "B", "…
$ log_magnitude <dbl> 0.7634280, 0.3802112, 0.6127839, 0.1461280, 0.7242759, 0…
$ point_size    <dbl> 28760, 24980, 27200, 30010, 28760, 24700, 29450, 22460, …
$ type_grouped  <chr> "G (Sun-like)", "A (White)", "F (Yellow-White)", "K (Ora…

Visualization: Hertzsprung-Russell Diagram

Code
# Custom non-default color palette — space-inspired colors for star types
star_colors <- c(
  "B (Blue-White)"   = "#5B8CFF",   # blue
  "A (White)"        = "#C8D8FF",   # pale blue-white
  "F (Yellow-White)" = "#FFFACD",   # light yellow
  "G (Sun-like)"     = "#FFD700",   # gold
  "K (Orange)"       = "#FFA040",   # orange
  "M (Red Dwarf)"    = "#FF4500",   # red-orange
  "DA (White Dwarf)" = "#E0E0FF",   # pale white
  "DB (White Dwarf)" = "#B0B0FF",   # lavender
  "Other"            = "#AAAAAA"    # gray
)

ggplot(stars_clean, aes(
  x     = temp,
  y     = log_magnitude,
  color = type_grouped,
  size  = point_size
)) +
  geom_point(alpha = 0.85) +

  # Reverse x-axis: astronomers plot hot stars on the LEFT
  scale_x_reverse(
    labels = scales::comma,
    breaks = c(3000, 5000, 10000, 20000, 40000)
  ) +

  # Custom non-default colors
  scale_color_manual(
    values = star_colors,
    name   = "Star Type (Spectral Class)"
  ) +

  # Size scale
  scale_size_continuous(
    name   = "Brightness\n(relative)",
    range  = c(2, 10),
    guide  = guide_legend(override.aes = list(color = "white"))
  ) +

  # Labels
  labs(
    title    = "The Hertzsprung-Russell Diagram: Temperature vs. Luminosity",
    subtitle = "Each point is a star — hotter stars are plotted to the LEFT (as per astronomical convention)",
    x        = "Surface Temperature (Kelvin)",
    y        = "Log Magnitude (transformed)",
    caption  = "Data source: dslabs R package (stars dataset) | Hertzsprung-Russell diagram"
  ) +

  # Non-default dark space theme
  theme_dark(base_size = 13) +
  theme(
    plot.background  = element_rect(fill = "#0A0A1A", color = NA),
    panel.background = element_rect(fill = "#0D0D2B", color = NA),
    panel.grid.major = element_line(color = "#1E1E3A", linewidth = 0.5),
    panel.grid.minor = element_line(color = "#141430", linewidth = 0.3),
    plot.title       = element_text(color = "white", face = "bold", size = 14),
    plot.subtitle    = element_text(color = "#A0A0C0", size = 11),
    plot.caption     = element_text(color = "#606080", size = 9, hjust = 0),
    axis.text        = element_text(color = "#A0A0C0"),
    axis.title       = element_text(color = "white"),
    legend.background = element_rect(fill = "#0A0A1A", color = NA),
    legend.text      = element_text(color = "#C0C0D0"),
    legend.title     = element_text(color = "white"),
    legend.key       = element_rect(fill = "#0A0A1A", color = NA)
  )


Interactive Version (Plotly)

Code
# Build interactive H-R diagram using plotly
# ggplotly() converts our ggplot into an interactive chart
# Hovering over any point shows the star type and values

p_interactive <- ggplot(stars_clean, aes(
  x     = temp,
  y     = log_magnitude,
  color = type_grouped,
  text  = paste0("Type: ", type_grouped,
                 "<br>Temp: ", temp, " K",
                 "<br>Log Luminosity: ", round(log_magnitude, 2))
)) +
  geom_point(size = 3, alpha = 0.85) +
  scale_x_reverse(labels = scales::comma) +
  scale_color_manual(values = star_colors, name = "Star Type") +
  labs(
    title   = "Hertzsprung-Russell Diagram (Interactive)",
    x       = "Surface Temperature (K)",
    y       = "Log Magnitude (transformed)",
    caption = "Data source: dslabs R package (stars dataset)"
  ) +
  theme_dark(base_size = 12) +
  theme(
    plot.background  = element_rect(fill = "#0A0A1A", color = NA),
    panel.background = element_rect(fill = "#0D0D2B", color = NA),
    panel.grid.major = element_line(color = "#1E1E3A"),
    plot.title       = element_text(color = "white", face = "bold"),
    axis.text        = element_text(color = "#A0A0C0"),
    axis.title       = element_text(color = "white"),
    legend.background = element_rect(fill = "#0A0A1A", color = NA),
    legend.text      = element_text(color = "#C0C0D0"),
    legend.title     = element_text(color = "white"),
    legend.key       = element_rect(fill = "#0A0A1A", color = NA)
  )

# Convert to interactive plotly chart
ggplotly(p_interactive, tooltip = "text")

Description & Insights

The Hertzsprung-Russell (H-R) diagram is a foundational tool in astrophysics, and recreating it with the stars dataset from dslabs reveals several fascinating patterns. Each point represents one of 96 stars plotted by surface temperature (x-axis, reversed so hotter stars appear on the left) against log₁₀ magnitude relative to the Sun (y-axis). Point size reflects brightness — larger circles indicate brighter stars.

The most striking feature is the main sequence — a diagonal band running from the upper-left (hot, luminous blue stars) to the lower-right (cool, dim red dwarfs). This band contains the majority of stars, including our Sun (a G-type star near the middle). Stars spend most of their lives on this main sequence, fusing hydrogen in their cores.

Two especially interesting outliers appear: the red giants and supergiants in the upper-right (cool but extremely luminous, meaning they must be enormous in size), and the white dwarfs in the lower-left (hot but very dim, meaning they are tiny and dying). The DA and DB white dwarfs cluster distinctly away from the main sequence, exactly where astrophysics predicts.

The interactive Plotly version allows hovering over individual stars to see their spectral type, temperature, and magnitude — adding a layer of exploration not possible in a static chart.

This visualization is dramatically different from the typical examples shown in the dslabs tutorial, which generally use gapminder for scatterplots or murders for regression lines. The H-R diagram uses astronomical data, a reversed x-axis (a deliberate non-standard choice), a custom dark space-themed background, and bubble sizing to encode a fourth variable — all of which are absent from the course examples. ```