Aspects of Nearby Stars

# it is better to let the user manually install packages in their environment, but if you want to automatically 
# install them, un-comment the next line

#install.packages(c("flextable", "ggplot2", "GGally", "gridExtra", "scales", "tidyverse", "summarytools"))

# define shhh to suppress warning messages on library loads following the example at 
# https://stackoverflow.com/questions/18931006/how-to-suppress-warning-messages-when-loading-a-library

shhh <- suppressPackageStartupMessages        # It's a library, so shhh!

shhh(library(flextable))
shhh(library(ggplot2))
shhh(library(GGally))
shhh(library(gridExtra))
shhh(library(scales))
shhh(library(tidyverse))
shhh(library(summarytools))

# Changes the ggplot default behavior to center plot titles. Only needed once.
theme_update(plot.title = element_text(hjust = 0.5))

Objective

The purpose of this document to examine the charactristics of 14,876 nearby stars, i.e., stars within 100 light-years (ly) of our sun. This is accomplished through examining descriptive statistics for this star population and examining plots of several parameters.

Data and Data Source

NASA’s Transiting Exoplanet Survey Satellite (TESS) is an all-sky survey mission to detect exoplanets around nearby bright stars. TESS was launched on April 18, 2018

TESS has created a catalog containing information on 128 parameters about the Candidate Target List (CTL) of over 9.5 million stars in our galaxy. The TESS Input Catalog (TIC) includes the best measured values of the stellar distances, masses, sizes, star luminosities, compositions, and others. totalling 144 parameters in all. The TIC v 8.2 data is publicly available without cost and is hosted by the Mikulski Archive for Space Telescopes (MAST).

Using the MAST CasJobs” query GUI, the values of several parameters for each of the 14,876 stars within 100 ly of our sun were downloaded as a CSV file.

Potential Questions

Examining the TIC data for the nearby stars, we can characterize to stars comprising the local neighborhood. Questions include:

Are the values for each parameter reasonable?
What fraction of the parameter values have not yet been determined or estimated for the nearby stars?
What collinearities between parameters might allow the prediction of one from the other?
Are the Spectral Groups found in the local star neighborhood representative of all stars?
What is the average density of the nearby stars?

Parameter Descriptions

The following table lists the TESS name of 14 parameter selected for this study, their data types and their descriptions. Also, in the process of reading in the external CSV file, some of the columns were renamed to aid in understand what they represent.

# uses library(flextable) to format the parameter description into a table

# define data frame describing each column of the star data read in from the csv file
columns_df <- tibble(
  Renamed = c("ID", "RA", "DEC", "Parallax", "B-magnitude", "V-magnitude", "TESSmagnitude", "Teff", "Log-g",
              "Metallicity", "Radius", "Mass", "Luminosity", "Distance", "Spectral_Group"),
  TESS_Name = c("ID", "ra", "dec", "plx", "Bmag", "Vmag", "Tmag", "Teff", "logg", "MH", "rad", "mass", "lum", "d",
                "(calculated)"),
  Type = c("bigint[8]", "float[8]", "float[8]", "real[4]", "real[4]", "real[4]", "real[4]", "real[4]", "real[4]",
           "real[4]", "real[4]", "real[4]", "real[4]", "real[4]", "character[1]"),
  Description = c("TESS Input Catalog ID", "Right Ascension (i.e., East-West) Coordinate, JD2000 (deg)",
                "Declination (i.e., North-South)nCoordinate, JD2000 (deg)",
                "Parallax (milliarcseconds, or mas), i.e., shift in position as the earth orbits the sun",
                "Johnson B Magnitude (mag), i.e., apparent brightness of the blue components of the light ",
                "Johnson V Magnitude (mag), i.e., apparent brightness of the visible-spectrum components of the light",
                "TESS Magnitude (mag), i.e., the apparent brightness as observed by TESS",
                "Effective Temperature (K) of the star if it were a black body",
                "log Surface Gravity of the star (centimete-gram-seconds, or cgs)", 
                "Metallicity (dex), i.e., the ratio of heavier element abundance (e.g., Iron of Fe) to that of Hydrogen",
                "Stellar Radius (solar radii)", "Stellar Mass (solar masses)",
                "Stellar Luminosity (solar units), or the inherent brightness of the star",
                "Distance from the sun to the star (parsecs in the TESS database, converted here to light-years, or ly)",
                "Letter designating the spectral type of the star (i.e., from hottest to coolest, O, B, A, F, G, K, M)")
)

#set the table styling
set_flextable_defaults(font.family = "Calibri (Body)",
                       font.size = 12, 
                       digits = 0, 
                       border.color = "#000000",
                       padding.bottom = 1,
                       padding.top = 1,
                       padding.left = 3,
                       padding.right = 1)

 #convert the dataframe to flextable object
ft <- flextable(columns_df, defaults = TRUE)

# set the column widths of the table 
ft <- ft %>%
    width(j = 1,
          width = 1) %>%
    width(j = 2,
          width = 1) %>%
    width(j = 3,
          width = 1) %>%
    width(j =4,
          width = 4)

# add caption
ft <- set_caption(ft, caption = "Table 1: The Selected TESS Data Columns", 
                  style = "Table Caption")

# make the column headers bold
ft <- bold(ft, bold = TRUE, part = "header")

# print the flextable object to a table
ft

Table 1: The Selected TESS Data Columns
Renamed	TESS_Name	Type	Description
ID	ID	bigint[8]	TESS Input Catalog ID
RA	ra	float[8]	Right Ascension (i.e., East-West) Coordinate, JD2000 (deg)
DEC	dec	float[8]	Declination (i.e., North-South)nCoordinate, JD2000 (deg)
Parallax	plx	real[4]	Parallax (milliarcseconds, or mas), i.e., shift in position as the earth orbits the sun
B-magnitude	Bmag	real[4]	Johnson B Magnitude (mag), i.e., apparent brightness of the blue components of the light
V-magnitude	Vmag	real[4]	Johnson V Magnitude (mag), i.e., apparent brightness of the visible-spectrum components of the light
TESSmagnitude	Tmag	real[4]	TESS Magnitude (mag), i.e., the apparent brightness as observed by TESS
Teff	Teff	real[4]	Effective Temperature (K) of the star if it were a black body
Log-g	logg	real[4]	log Surface Gravity of the star (centimete-gram-seconds, or cgs)
Metallicity	MH	real[4]	Metallicity (dex), i.e., the ratio of heavier element abundance (e.g., Iron of Fe) to that of Hydrogen
Radius	rad	real[4]	Stellar Radius (solar radii)
Mass	mass	real[4]	Stellar Mass (solar masses)
Luminosity	lum	real[4]	Stellar Luminosity (solar units), or the inherent brightness of the star
Distance	d	real[4]	Distance from the sun to the star (parsecs in the TESS database, converted here to light-years, or ly)
Spectral_Group	(calculated)	character[1]	Letter designating the spectral type of the star (i.e., from hottest to coolest, O, B, A, F, G, K, M)

Determining the Spectral Group

After reading in the data for the nearby stars from the CSV file, the Spectral Group for each star was determined from its effective Temperature (Teff) using the temperature ranges in Table 2 below. Also shown is the percentage of all stars for each Spectral Group obtained from https://www.star-facts.com/types-of-stars/

# uses library(flextable) to format the parameter description into a table. Also uses the dplyr library

# define data frame describing each column of the star data read in from the csv file
spg_df <- tibble(
  Spectral_Group = c("O", "B", "A", "F", "G", "K", "M"),
  min__Teff = c("10,000", "7,500", "6,000", "5000", "3,500", "2,200", "1,400"),
  Teff_less_than = c("30,000", "10,000", "7,500", "6,000", "5,000", "3,500", "2200"),
  Pct_All_Stars = c(0.0003, 0.13,0.6, 3, 7.6, 12.1, 76.45),
)

#set the table styling
set_flextable_defaults(font.family = "Calibri (Body)",
                       font.size = 12, 
                       digits = 0, 
                       border.color = "#000000",
                       padding.bottom = 1,
                       padding.top = 1,
                       padding.left = 3,
                       padding.right = 1)

 #convert the dataframe to flextable object
spg_ft <- flextable(spg_df, defaults = TRUE)

# set the column widths of the table 
spg_ft <- spg_ft %>%
    width(j = 1.5,
          width = 1) %>%
    width(j = 2,
          width = 1) %>%
    width(j = 3,
          width = 1) %>%
    width(j = 4,
          width = 1.5) 

# add caption
spg_ft <- set_caption(spg_ft, caption = "Table 2: The Spectral Group relationship to Teff", 
                  style = "Table Caption")

# make the column headers bold
spg_ft <- bold(spg_ft, bold = TRUE, part = "header")

# center-align each column
spg_ft <- align(spg_ft, align = "center", part = "all")

# print the flextable object to a table
spg_ft

Table 2: The Spectral Group relationship to Teff
Spectral_Group	min__Teff	Teff_less_than	Pct_All_Stars
O	10,000	30,000	0.0003
B	7,500	10,000	0.1300
A	6,000	7,500	0.6000
F	5000	6,000	3.0000
G	3,500	5,000	7.6000
K	2,200	3,500	12.1000
M	1,400	2200	76.4500

# read in the data from the CSV file into a dataframe

star_df <- read.csv('tic_nearby_awcox.csv') # open and read the csv file containing the stellar data

# the data is read in as character strings, The following command converts the character strings to numeric.
star_df[] <- lapply(star_df, function(x) if(is.character(x)) as.numeric(x) else x)

# rename the dataframe columns to promote better understanding
names(star_df) <- c('ID','RA','DEC','Parallax','B-magnitude','V-magnitude','TESSmagnitude','Teff','Log-g','Metallicity',
                    'Radius','Mass','Luminosity','Distance')

# convert the Distance in parsecs (pc) to light-years (ly)
star_df$Distance <- star_df$Distance*3.2616         # convert parsec (pc) to light-year (ly)

# Determine the Spectral Group from the value of Teff
sp_type <- vector()

for(i in 1:nrow(star_df)) {
  if(is.na(star_df$Teff[i])) {
    sp_type[i] <- NA
  }
  else {
    if (30000 <= star_df$Teff[i] && star_df$Teff[i] < 100000) {
      sp_type[i] = 'O'
    } else if (10000 <= star_df$Teff[i] && star_df$Teff[i] < 30000) {
      sp_type[i] = 'B'
    } else if (7500 <= star_df$Teff[i] && star_df$Teff[i] < 10000) {
      sp_type[i] = 'A'
    } else if (6000 <= star_df$Teff[i] && star_df$Teff[i] < 7500) {
      sp_type[i] = 'F'
    } else if (5000 <= star_df$Teff[i] && star_df$Teff[i] < 6000) {
      sp_type[i] = 'G'
    } else if (3500 <= star_df$Teff[i] && star_df$Teff[i] < 5000) {
      sp_type[i] = 'K'
    } else if (2200 <= star_df$Teff[i] && star_df$Teff[i] < 3500) {
      sp_type[i] = 'M'
    } else if (1400 <= star_df$Teff[i] && star_df$Teff[i] < 2200) {
      sp_type[i] = 'L'
    } else {
      sp_type[i] = 'T'
    }
  }
}

# add the spectral group as another column in the dataframe
star_df$Spectral_Group <- sp_type

Data Exploration

Descriptive Statistics for the Nearby Stars

The following table provides the mean, standard deviation, minimum, median, maximum, a small histogram and the fractions valid and missing data.

# uses library(summarytools)

# return summary statistics of the data
print(dfSummary(star_df, graph.magnif = 0.75), method = 'render', silent = st_options("dfSummary.silent"))

Data Frame Summary

star_df

Dimensions: 14876 x 15
Duplicates: 0

Variable

Stats / Values

Freqs (% of Valid)

Graph

Valid

Missing

ID [integer]

Mean (sd) : 776936546 (667281629)

min ≤ med ≤ max:

16855 ≤ 416519070 ≤ 2055502161

IQR (CV) : 1275895539 (0.9)

14876 distinct values

14876 (100.0%)

0 (0.0%)

RA [numeric]

Mean (sd) : 215.2 (92.7)

min ≤ med ≤ max:

0.1 ≤ 258.5 ≤ 360

IQR (CV) : 132.2 (0.4)

14876 distinct values

14876 (100.0%)

0 (0.0%)

DEC [numeric]

Mean (sd) : -7.9 (35.5)

min ≤ med ≤ max:

-87.8 ≤ -15.9 ≤ 89.4

IQR (CV) : 51.3 (-4.5)

14875 distinct values

14876 (100.0%)

0 (0.0%)

Parallax [numeric]

Mean (sd) : 71.1 (100)

min ≤ med ≤ max:

0.9 ≤ 42.7 ≤ 1851.9

IQR (CV) : 22.4 (1.4)

14401 distinct values

14760 (99.2%)

116 (0.8%)

B-magnitude [numeric]

Mean (sd) : 13 (3.8)

min ≤ med ≤ max:

-0.1 ≤ 13.9 ≤ 21.8

IQR (CV) : 5.2 (0.3)

4835 distinct values

6142 (41.3%)

8734 (58.7%)

V-magnitude [numeric]

Mean (sd) : 14.7 (4.6)

min ≤ med ≤ max:

-0.1 ≤ 14.5 ≤ 22.1

IQR (CV) : 7.5 (0.3)

7563 distinct values

10932 (73.5%)

3944 (26.5%)

TESSmagnitude [numeric]

Mean (sd) : 14.4 (4.8)

min ≤ med ≤ max:

-1.3 ≤ 14 ≤ 21.1

IQR (CV) : 8.6 (0.3)

13777 distinct values

14876 (100.0%)

0 (0.0%)

Teff [numeric]

Mean (sd) : 3780.1 (995.1)

min ≤ med ≤ max:

2717 ≤ 3398 ≤ 11535

IQR (CV) : 724 (0.3)

2673 distinct values

7069 (47.5%)

7807 (52.5%)

Log-g [numeric]

Mean (sd) : 4.8 (0.3)

min ≤ med ≤ max:

3.6 ≤ 4.9 ≤ 5.3

IQR (CV) : 0.3 (0.1)

6691 distinct values

6965 (46.8%)

7911 (53.2%)

Metallicity [numeric]

Mean (sd) : -0.1 (0.3)

min ≤ med ≤ max:

-1.8 ≤ -0.1 ≤ 0.5

IQR (CV) : 0.3 (-2.5)

571 distinct values

983 (6.6%)

13893 (93.4%)

Radius [numeric]

Mean (sd) : 0.5 (0.4)

min ≤ med ≤ max:

0.1 ≤ 0.4 ≤ 7.1

IQR (CV) : 0.4 (0.8)

7000 distinct values

7033 (47.3%)

7843 (52.7%)

Mass [numeric]

Mean (sd) : 0.4 (0.3)

min ≤ med ≤ max:

0.1 ≤ 0.3 ≤ 2.4

IQR (CV) : 0.4 (0.7)

5533 distinct values

6965 (46.8%)

7911 (53.2%)

Luminosity [numeric]

Mean (sd) : 0.3 (1.2)

min ≤ med ≤ max:

0 ≤ 0 ≤ 26.3

IQR (CV) : 0.1 (4.3)

6720 distinct values

6965 (46.8%)

7911 (53.2%)

Distance [numeric]

Mean (sd) : 69.5 (25)

min ≤ med ≤ max:

3.3 ≤ 76.5 ≤ 99.8

IQR (CV) : 34 (0.4)

14363 distinct values

14876 (100.0%)

0 (0.0%)

Spectral_Group [character]

1. A

2. B

3. F

4. G

5. K

6. M

41	(	0.6%	)
2	(	0.0%	)
277	(	3.9%	)
648	(	9.2%	)
1875	(	26.5%	)
4226	(	59.8%	)

7069 (47.5%)

7807 (52.5%)

Generated by summarytools 1.0.0 (R version 4.1.2)
2022-02-25

Examining the statistics for each of the columns above, we note:

The TESSID value is not a stellar characteristic, but only included for reference purposes. It will be dropped from the analysis
The RA and DEC values are the location coordinates in the sky and not used in this analysis. They will also be dropped.
The Parallax measurements are a function of the earth’s orbit and related to the Distance to the star. The Parallax values are redundant information and will be dropped.
The difference between the B-magnitude and the V-magnitude is related to the effective temperature (Teff) of the star. However, roughly half of the nearby stars are missing B-magnitude values, and roughly 25% are missing V-magnitude values. These magnitudes will also be dropped from the analysis.
TESSmagnitude values are present for all 14,876 nearby stars and the minimum and maximum values are realistic. The larger the TESSmagnitude value, the dimmer the star appears in the sky.
The range of Teff and Log-g values are each realistic, but values are present in each case for only about 50% of the nearby stars.
Metallicity values are only available for less than 7% of the nearby stars and this parameter will be dropped from the analysis.
The Radius and Mass values are realistic but available for only about 50% of the nearby stars.
Luminosity values are realistic except for the minimum value of 0 (which would imply no brightness and so would not be detectable). However, the Luminosity values are present for only roughly 50% of the nearby stars.
The values of Distance are realistic and there are no missing values.

Handling Missing Values

For the parameters being kept in the analysis, imputing any missing values based upon population means or by interpolation would likely lead to erroneous results. Some of these parameters are related to one anothers via physical constraints that may be simple or complex.

For example, the log of the surface gravity g (i.e., Log-g) is a function of both the Mass and volume (which itself is a function of the star’s Radius). However, measurements of the Mass or the Radius cannot be made directly as we cannot put the star on a scale and the “surface” of the star is uncertain.

Consequently, it is likely best to just drop the specific missing parameter values when examining the different parameter characteristics.

Parameter Boxplots

Boxplots provide a summary of a set of data using five quantities, i.e., the minimum value, first (lower) quartile, median, third (upper) quartile, and maximum value. A boxplot visually depicts the data distribution and skewness through displaying the data quartiles and median.

The upper and lower “whiskers” represent values outside of the middle 50% (i.e. the lower and upper 25% of scores, respectively). The points that fall beyond the quartile ranges are displayed to indicate outliers. These “outliers” may not be erroneous data, however.

In the case of Figure 1, below, each parameter has been separately “normalized” to have the range between 0 and 1 because the magnitudes of the parameter values vary widely. Each normalization was accomplished by subtracting the minimum value and dividing by the range.

subset_df = star_df[ , c('TESSmagnitude','Teff', 'Log-g', 'Radius', 'Mass', 'Luminosity', 'Distance')]

# uses dplyr library to reshape the original data into a df of with columns 'variable' and 'value' and removes NA values
# also, the data for each variable is rescaled by subtracting the minimum and dividing by the variable's range

df.long <- subset_df %>% 
  pivot_longer(Distance:TESSmagnitude, names_to = 'variable', values_to = 'value', values_drop_na = TRUE) %>% 
  group_by(variable) %>% 
  mutate(value_norm = value - min(value), 
         value_norm = value_norm / max(value_norm)
  )

ggplot(data = df.long, aes(x = variable, y = value_norm, fill = variable)) +
  geom_boxplot() +
  labs(x="Parameter",y="Normalized Values", 
      caption='Fig. 1: Boxplots depicting the distributions of parameter values.') + 
  ggtitle("Boxplots of Parameter Values for Nearby Stars") +
  theme(plot.title = element_text(size=14, face="bold")) +
  theme(plot.caption= element_text(size=12,hjust = 0, vjust = 0))

As shown in Fig. 1, the Distance and Log-g distributions are centered around higher values with a skew toward smaller values. The Luminosity, Mass, Radius and Teff parameter distributions are centered around small values and are somewhat skewed toward larger values. For the TESSmagnitude, the distribution is centered around higher values and is skewed to even higher values.

What this all suggests is that the nearby stars are mostly at larger distances, are smaller (e.g., Log-g, Radius, Mass) and cooler and less bright (i.e., higher magnitude values are related to more distant and/or less luminous stars).

Scatterplot Matrix and Correlation Coefficients

For a given a set of variables, the scatterplot matrix contains all the pairwise scatter plots of the variables on a single diagram in a matrix format. Also shown on the diagonal in Figure 2, below, are the density functions of each parameter. The upper-right triangle displays the correlation coefficient for each pair of parameters.

# requires library GGally

ggpairs(subset_df, title='Scatterplot Matrix for Local Stars Data') +
        labs(caption = "Fig. 2: Pairwise scatterplots and correlation values.") +
        theme(plot.title = element_text(size=14, face="bold")) +
        theme(plot.caption= element_text(size=12,hjust= 0, vjust = 0))

In Figure 2 above, the scatterplot of Teff versus Mass shows a high linearity, having the highest correlation, r = 0.974 . Other highly linear relationships are Log-g and Radius ( r = -0.954), as well as Radius and Mass ( r = -0.958). The scatterplot for Mass and Luminosity appears to show a power-law relationship.

Bar Chart of Spectral Types

The bar chart in Figure 3, below, displays the percentage of nearby stars in each spectral type in blue. Also shown are the percentages for all the stars in our galaxy in green.

spectralTypes <- c('O', 'B', 'A', 'F', 'G', 'K', 'M')

# from https://www.star-facts.com/types-of-stars/
# the percent of all stars in our galaxy by spectral type
spTypePct <- c(0.0003, 0.13,0.6, 3, 7.6, 12.1, 76.45)

# compute the pct by spectral type for the nearby stars
sp_type_df <- as.data.frame(na.omit(star_df$Spectral_Group))
total_count <- nrow(sp_type_df)
pctLocalStars <- c(sum(sp_type_df == 'O'), sum(sp_type_df == 'B'), sum(sp_type_df == 'A'), sum(sp_type_df == 'F'),
                       sum(sp_type_df == 'G'), sum(sp_type_df == 'K'),sum(sp_type_df == 'M'))*100/total_count

Star_Group = c('All Stars','All Stars','All Stars','All Stars','All Stars','All Stars','All Stars',
              'Local Stars','Local Stars','Local Stars','Local Stars','Local Stars','Local Stars','Local Stars')
Spectral_Class = c(spectralTypes, spectralTypes)
pctSpectralClass = c(spTypePct, pctLocalStars)

# assemble data frame of pct's for both all stars and nearby stars
spectralClass_df = data.frame(Star_Group, Spectral_Class, pctSpectralClass)

# order the spectral types for plotting
spectralClass_df$Spectral_Class = factor(spectralClass_df$Spectral_Class, levels = spectralTypes)


# uses library scales to convert from counts to percent format

ggplot(spectralClass_df, aes(y=pctSpectralClass, x=Spectral_Class, fill=Star_Group)) + 
  geom_bar(position='dodge', stat='identity') + 
  ggtitle("Spectral Types of Nearby Stars") +
  labs(x="Spectral Type",y="Percentt",
       caption='Fig. 3: The percentage of nearby stars by spectral type compared with all stars.') + 
  scale_fill_manual(values = c("cornflowerblue", 'darkgreen')) +
  theme(plot.title = element_text(size=14, face="bold")) +
  theme(plot.caption= element_text(size=12,hjust = 0, vjust = 0))

As seen in Figure 3, there are relatively few very hot stars (i.e., types O and B). Also, there are only about half as many of the cooler K-type stars in the local group as for all stars. By comparison, the still cooler M-type stars have a much larger representation in the local group as compared with all stars.

Proportion of Spectral Types by Distance

Figure 4, below, displays for each 2-ly bin of distance from the sun the relative proportion of each star spectral type. As is can be seen in Figure 4, the total count of stars increases with distance. This is due to the volume of a 2-ly thick spherical shell increasing with radius (e.g., the Distance from the sun). However, for distances approximately 25-ly, the relative proportion of the spectral types are roughly the same.

# set up data frame for counting spectral group by distance
star_density_df <- star_df[ , c('Distance','Spectral_Group')]
star_density_df <- na.omit(star_density_df)

# order the spectral groups
star_density_df$Spectral_Group <- factor(star_density_df$Spectral_Group,levels=c('B', 'A', 'F', 'G', 'K', 'M'))

# plot the histogram
ggplot(star_density_df, aes(x=Distance, fill=Spectral_Group)) + 
  geom_histogram(binwidth=2) +
  ggtitle("Stellar Spectral Group Counts by Distance from Sun") +
  labs(x="Distance (ly)",y="Count",
        caption='Fig. 4: The count of stars spectral types by distance from the sun.') + 
  xlim(0, 100) +
  scale_fill_manual(values=c('deepskyblue','cornflowerblue','yellowgreen','yellow','orange','red2')) +
  theme(plot.title = element_text(size=14, face="bold")) +
  theme(plot.caption= element_text(size=12,hjust = 0, vjust = 0))

Average Stellar Spatial Density by Distance

The nearby stars are distributed in three dimensions about our sun, each at different distances. The average stellar density here is computed by counting the number of stars that lie within each 2-ly thick spherical shell centered around the sun, and dividing by the volume of space in each shell. The linear regression of results of these density calculations are shown in Figure 5, below.

#perform data binning by distance
cuts <- cut(star_density_df$Distance, breaks=seq(0,100, by=1))
count <- c(t(table(cuts)))
count_df <- data.frame(count)
count_df$Distance <- 1:nrow(count_df)

# compute volumes of 2-ly thick spherical shell by radius (= distance)
volume <- matrix(NA,nrow=100, ncol=1)
volume_df <- data.frame(volume)
for(i in 1:nrow(volume_df)) {
  volume_df[i,1] <- 4*pi*(i^3 - (i-2)^3)/3.
} 

#compute star density
count_df$Density<- 1000.*count_df$count/volume_df$volume          # scale density to 1000 ly^3
count_df$x <- count_df$Distance
count_df$y <- count_df$Density



# From stack overflow link https://stackoverflow.com/questions/7549694/add-regression-line-equation-and-r2-on-graph
# function to convert the regression line equation and R^2 to a string so that the text can be added to the graph
lm_eqn <- function(df){
    m <- lm(y ~ x, df);
    eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2, 
         list(a = format(unname(coef(m)[1]), digits = 2),
              b = format(unname(coef(m)[2]), digits = 2),
              r2 = format(summary(m)$r.squared, digits = 3)))
    as.character(as.expression(eq));
}

ggplot(count_df,aes(Distance, Density)) +
  geom_point(color='blue', size=0.5) +
#  stat_summary(fun.data=mean_cl_normal) + 
  ggtitle("Density of Nearby Stars by Distance") +
  geom_smooth(method='lm', formula= y~x, color = 'red') +
  geom_text(x =75, y = 3.5, label = lm_eqn(count_df), parse = TRUE) +
  labs(x="x = Distance (ly)",y=expression(paste("y = Stars per 1,000 ", ly^{3})),
          caption='Fig. 5: The spatial density of nearby stars as a function of distance from the sun.') + 
  theme(plot.title = element_text(size=14, face="bold")) +
  theme(plot.caption= element_text(size=12,hjust = 0, vjust = 0))

In Figure 5, the actual calculated stellar density values have been overplotted on the regression line. The grey-shaded region about the red regression line is the 95% confidence interval. As can be seen, the linear relationship is quite a good fit for distances greater than about 27-ly, with the stellar density roughly 0.83 stars per 1000 cubic-ly (equivalent to a box 10-ly on each side).

For distances smaller than 27 ly, however, the stellar density values become much more scattered. For distances less than about 10-ly, the stellar density values are mostly zero. In other words, the space around the sun is notably less dense than other regions of nearby stars.

Conclusions

The population of stars nearby to our sun (i.e., less than 100-ly in distance) does not include proportionally as many very hot stars of spectral class O and B (i.e., white and blue-white) as the rest of the galaxy, but has proportionally more much cooler M stars (i.e., red) than the rest of the galaxy. Also, the average stellar density of the space near our sun is smaller than the average stellar density for the remainder of the nearby stars.

In other words, we live in a more “rural” section of the space within 100-ly, and in a stellar population that has more M-type stars having temperatures in the range 2200 K and 3500 K, and are less luminous.

Acknowledgment:

This paper includes data collected by the TESS mission, which are publicly available from the Mikulski Archive for Space Telescopes (MAST). Funding for the TESS mission is provided by NASA’s Science Mission directorate.

Aspects of Nearby Stars

Andrew Cox

2/25/2022

Objective

Data and Data Source

Potential Questions

Parameter Descriptions

Determining the Spectral Group

Data Exploration

Descriptive Statistics for the Nearby Stars

Data Frame Summary

star_df

Handling Missing Values

Parameter Boxplots

Scatterplot Matrix and Correlation Coefficients

Bar Chart of Spectral Types

Proportion of Spectral Types by Distance

Average Stellar Spatial Density by Distance

Conclusions

Acknowledgment: