library("dslabs")
data(package="dslabs")
#list.files(system.file("script", package = "dslabs"))DS_labs_h
We will be plotting two multivariable charts using a chosen dataset from the DsLabs-Library.
First i call in DsLabs
“Stars” is a beautiful yet simple dataset I was fortunate enough to come across among others in the DS labs. Only four columns provide information on the star’s name, magnitude, temperature, and type. Having few observables, I could visually count 96 entries with no missing values. We will check again to be thorough. I did some research to understand what the magnitude numbers and types mean.
The magnitude can simply be explained as how we visually see the star from earth, this is NOT the actual magnitude of the star, which incorporates the actual calculated distance.
The spectral type tells us about the star’s properties and chemical composition. The Harvard College Observatory has classified stars from coolest to hottest based on the visual absorption lines. The O-type stars are the hottest, showing a visibly blue color and strong wavelengths, while the M-type stars are the coolest, having longer, weaker wavelengths. I will Provide a visual table for easier reference.
The temperature is measured in Kelvin (K) as the standard unit for stellar temperatures.
Image credit: Lucas VB, Wikimedia
Next, we select the libraries and load our data into our global environment.
data("stars")
library(tidyverse)── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(RColorBrewer)
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(ggalluvial)```
Lets take a look at the data set
head(stars) star magnitude temp type
1 Sun 4.8 5840 G
2 SiriusA 1.4 9620 A
3 Canopus -3.1 7400 F
4 Arcturus -0.4 4590 K
5 AlphaCentauriA 4.3 5840 G
6 Vega 0.5 9900 A
Looking at the data set, we have 4 columns and 96 entries.
str(stars)'data.frame': 96 obs. of 4 variables:
$ star : Factor w/ 95 levels "*40EridaniA",..: 87 85 48 38 33 92 49 79 77 47 ...
$ magnitude: num 4.8 1.4 -3.1 -0.4 4.3 0.5 -0.6 -7.2 2.6 -5.7 ...
$ temp : int 5840 9620 7400 4590 5840 9900 5150 12140 6580 3200 ...
$ type : chr "G" "A" "F" "K" ...
I will be converting the data types as to make sure its in correct format for my plot.
I then also noted that 4 star types were out of the regular spectral category. Having, DF, DB, DA. These are stars classified as white dwarves as their luminosity is quite faint compared to the range of the others.
stars_1 <- stars |>
mutate(
star = as.character(star),
magnitude = as.numeric(magnitude),
temp = as.numeric(temp),
type = as.factor(type)
) |>
drop_na(magnitude, temp, type) |>
mutate(type = factor(type, levels = c("O", "B", "A", "F", "G", "K", "M","DA", "DB", "DF")))I will create an interactive plot that displays the star’s magnitude on the Y-axis and the temperature on the X-axis. For this specific type of plot, known as a Hertzsprung-Russell diagram, these axes are typically reversed; temperature decreases to the right while magnitude increases on the left (although, due to the way a star’s magnitude is numbered, it seems to be decreasing).
P1 <- ggplot(stars_1,aes(x = temp, y = magnitude, color = type, text = paste0(
"Star: ", star,
"\nMagnitude: ", magnitude,
"\nTemperature: ", temp, "k",
"\nType: ", type
) )) +
geom_point(size = 3, alpha = 0.8) +
scale_x_reverse() +
scale_y_reverse() +
labs(
x = "Temeperature (ºK)",
y = "Apparent Magnitude",
color = "Spectral Type",
title = "Hertzsprung-Russsell Diagram") +
scale_color_manual(
values = c("O" = "blue",
"B" = "purple",
"A" = "grey",
"F" = "yellow",
"G" = "gold",
"K" = "orange",
"M" = "red",
"DA"= "magenta",
"DF"= "pink",
"DB"= "cyan"
)) +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5))I will add a tooltip to populate my interactive diagram with the stars’ information. Note that the luminosity increases with a lower magnitude number.
P1 <- ggplotly(P1, tooltip = "text")
P1While creating my diagram, I realized that I had used a commonly accepted type of diagram to represent stars. My inspiration came from a video of a projected bubble plot that illustrated different nations’ progress over time. I appreciated the colors and considered how the stars could be arranged in the diagram in a similar way.
This diagram also shows proof of the Doppler effect. Even cooler planets may appear more reddish in color due to weaker wavelengths but closer proximity, while the large hot giants, being far away, look blue as they are in the strongest part of the visible wavelength which blue-shifts at high speeds. Those distant giants provide evidence of the universe’s expansion. I do notice the cluster of the red dwarves clustering near our visibility range, thanfully toolkit and plotly allow for the scatterplot and HR diagram to zoom into clusters to see seperate values that are clustered.
Since this is a basic common, I will plot another multivariable plot.
First, I want to isolate the names of the stars so I can find the diameters in reference of our sun.
star_names <- stars |> select(star)Hand-collected star diameters provide an approximate scale for our stars’ sizes. It’s important to note that internal pressures from the fusion reactions in the cores of these stars lead to their expansion. This expansion, in turn, increases the star’s mass, creating a stronger inward gravitational pull. Consequently, the star tends to expand while simultaneously collapsing inward in this interconnected rhythm. This pulsation causes the star to both shrink and expand, which is why these values are only approximations.
I collected the diameter of each star relative to our sun from the web and divided it by 2 to represent the values in solar units. This is meant to correspond to our sun’s radius, but looking online, many have used either the diameter or the radius to show solar units; however, it’s said that astronomers use radius. I will try to determine which is best for scaling in my diagram.
setwd('/Users/oworenibanseyo/Desktop/Data 110 2025/Datasets')
Stars_Size <- read_csv("All_95_Stars_sizes.csv") Rows: 95 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Star, Spectral_Type
dbl (3): Temperature_K, Diameter_x_Sun, Radius_x_Sun
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(Stars_Size)# A tibble: 6 × 5
Star Temperature_K Diameter_x_Sun Spectral_Type Radius_x_Sun
<chr> <dbl> <dbl> <chr> <dbl>
1 Sun 5840 1 G 0.5
2 Sirius A 9620 1.71 G 0.855
3 Canopus 7400 65 A 32.5
4 Arcturus 4590 25.7 F 12.8
5 Alpha Centauri A 5840 1.22 K 0.61
6 Vega 9900 2.36 G 1.18
Before I sort this, I will exclude the white dwarfs because their spectral values drop slightly lower than the others, which could make them appear as anomalies on my plot.
stars_2 <- Stars_Size |>
filter(
!(Spectral_Type %in% c("DA", "DB", "DF")),
)
stars_3 <- stars_2 |>
mutate(
temp_bin = cut(Temperature_K,
breaks = quantile(Temperature_K, probs = seq(0, 1, 0.33), na.rm = TRUE),
labels = c("Cool", "Moderate", "Hot")),
size_bin = cut(Radius_x_Sun,
breaks = quantile(Radius_x_Sun, probs = seq(0, 1, 0.33), na.rm = TRUE),
labels = c("Small", "Medium", "Large"))
)Here i get the count of my variables while making sure my spectral type is in order from strongest to weakest luminociy.
stars_4 <- stars_3 |>
count(Spectral_Type, temp_bin, size_bin) |>
mutate(Spectral_Type = factor(Spectral_Type, levels = c("O", "B", "A", "F", "G", "K", "M")))Using an alluvial I’ll show the spectral type, temperature and size of the stars
P2 <- ggplot(stars_4, aes(axis1 = Spectral_Type, axis2 = temp_bin, axis3 = size_bin, y = n)) +
geom_alluvium(aes(fill = Spectral_Type), width = 1/12, alpha = 0.8) +
geom_stratum(width = 1/12, fill = "grey", color = "black") +
geom_label(stat = "stratum", aes(label = after_stat(stratum))) +
scale_x_discrete(limits = c("Spectral Type", "Temperature", "Size"),
expand = c(.05, .05)) +
labs(title = "Stars by Spectral Type, Temperature, and Size",
y = "Count",
caption = "Source: No. 24 Stars_Dataset_DsLabs") +
theme_minimal()
P2Warning in to_lodes_form(data = data, axes = axis_ind, discern =
params$discern): Some strata appear at multiple axes.
Warning in to_lodes_form(data = data, axes = axis_ind, discern =
params$discern): Some strata appear at multiple axes.
Warning in to_lodes_form(data = data, axes = axis_ind, discern =
params$discern): Some strata appear at multiple axes.
Originally I wanted to keep my manually chosen colors from my HR diagram, but those colors proved awful for the alluvial, so I’ll keep the default assigned colors. I’ve made sure to follow the best standards for celestial bodies, from strongest spectra to weakest.
Conclusion : We can see that most of the M-level spectral types that burn cool cluster around the bottom right region on the HR diagram. Although I didn’t succeed in making a bubble plot with respect to the star sizes, I can still draw valuable insights from the relationships shown in the scatter plot and alluvial.
Sources:
Information on Stars: https://lco.global/spacebook/stars/types-stars/
Star sizes. : https://www.britannica.com/place/Sirius-star