Install and load required packages

library(tidyverse)
library(dslabs)
library(RColorBrewer)
library(viridis)
library(ggrepel)
library(ggthemes)
library(highcharter)

Load the dataset on physical properties of stars

data("stars")
view(stars)
head(stars)
## # A tibble: 6 × 6
##       bv absmag     lum   temp radiussun distance
##    <dbl>  <dbl>   <dbl>  <dbl>     <dbl>    <dbl>
## 1  1.86   -5.09   9454.  3316.     297.     170. 
## 2 -0.013 -10    870964. 10290.     296.  100000  
## 3  1.5    -5.47  13415.  3794.     271.     153. 
## 4  0.673  -6.52  35253.  5696.     195.     562. 
## 5  0.02   -8.09 149968.  9882.     133.  100000  
## 6  1.63   -1.88    490.  3608.      57.2     76.4

Draw a basic scatter plot

p1 <- stars |>
  ggplot(aes(x = temp, y = distance)) +
  geom_point(aes(colour = bv))

p1

Filter the outliers to see if we can visualize a pattern more clearly

stars1 <- stars |>
  filter(!distance > 25000)

Plot stars1

p2 <- stars1 |>
  ggplot(aes(x = temp, y = distance)) +
  geom_point(aes(colour = bv))

p2

Use a different color scale. Reverse the direction of the color scale to match the expected temperature for different B-V values.

p3 <- stars1 |>
  ggplot(aes(x = temp, y = distance)) +
  geom_point(aes(colour = bv)) +
  scale_color_viridis(discrete = FALSE, option = "plasma", direction = -1)

p3

Modify the plot to include labels and change the theme.

p4 <- stars1 |>
  ggplot(aes(x = temp, y = distance)) +
  geom_point(aes(colour = bv), alpha = 0.75, size = 2.5) +
  scale_color_viridis(discrete = FALSE, option = "plasma", direction = -1) +
  xlab("Temperature (K)") +
  ylab("Distance from Solar System (LY)") +
  ggtitle("Relationships between Physical Properties of Selected Stars") + 
  theme_few(base_family = "serif")

p4

Edit the legend, theme, remove grid lines, and center the title

p5 <- stars1 |>
  ggplot(aes(x = temp, y = distance)) +
  geom_point(aes(colour = bv), alpha = 0.9, size = 2.5, pch = 8) +
  scale_color_viridis(discrete = FALSE, option = "plasma", direction = -1) +
  xlab("Temperature (K)") +
  ylab("Distance from Solar System (LY)") +
  ggtitle("Physical Properties Of Our Nearest Stars") + 
  theme_gdocs() +
  theme(plot.title = element_text(hjust = 0.5), legend.position = c(0.7, 0.9), legend.direction = "horizontal") +
  theme(panel.background = element_rect(fill = "black", colour = "white")) +
  theme(panel.grid.major = element_blank(),
          panel.grid.minor = element_blank()) +
  labs(colour = "B-V Color Index")
  
  
p5

Essay

I used the stars dataset from the DSLabs package. This dataset contains information about the physical properties of 404 of nearby stars, in relation to our Solar System and the Sun. The following variables are included:

I chose to compare temperature, distance, and B-V index. The preliminary visualization included 2 outliers (high distance relative to temperature with very low B-V indices). These outliers were discarded to determine whether there was a visual pattern that could be interpreted using a regression line or curve.

A linear regression line and loess curve failed to appear on the final plot when input in the code, which may indicate that the relationship is not best represented by either of these models. However, it is easier to see the relationship between temperature and B-V index after discarding outliers. We would expect the B-V index to inversely correlate with temperature since hotter stars have a lower index number and cooler stars have a higher index number, and this is shown in the visualization. There does not appear to be a strong relationship between distance and the other two variables.

I inverted the “plasma” palette from the viridis package to approximate the colors of stars based on their B-V indices, since hot stars are yellow/white/red and cool stars are blue/indigo. This makes the visualization easier to interpret for those who are not familiar with the B-V index. I modified the plot background and point shapes to look like stars against the night sky.