Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
Internal Notes1
write_csv(stars, "stars.csv", na="")
Internal Notes2
#?write.csv
Internal Notes3
#write.csv(data1, "stars.csv")
Get working directory
getwd()
[1] "C:/Users/naomi/OneDrive/Desktop/Desktop of 11-08-2022/Community College Classes/DATA 110/DS Lab"
Read the selected data as a csv file
stars <-read_csv("stars.csv")
Rows: 96 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): star, type
dbl (2): magnitude, temp
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Review the first few rows of the data
head(stars)
# A tibble: 6 × 4
star magnitude temp type
<chr> <dbl> <dbl> <chr>
1 Sun 4.8 5840 G
2 SiriusA 1.4 9620 A
3 Canopus -3.1 7400 F
4 Arcturus -0.4 4590 K
5 AlphaCentauriA 4.3 5840 G
6 Vega 0.5 9900 A
Categorize levels and create new column using case when conditions
stars <- stars %>%mutate(stars =case_when(# Define the conditions for each star category temp >6000& magnitude <1~"Dwarfs", temp <6000& magnitude >10~"Giants", temp <10000& magnitude >100~"Supergiants",TRUE~"Super Stars" ))
Review updated dataset
head(stars)
# A tibble: 6 × 5
star magnitude temp type stars
<chr> <dbl> <dbl> <chr> <chr>
1 Sun 4.8 5840 G Super Stars
2 SiriusA 1.4 9620 A Super Stars
3 Canopus -3.1 7400 F Dwarfs
4 Arcturus -0.4 4590 K Super Stars
5 AlphaCentauriA 4.3 5840 G Super Stars
6 Vega 0.5 9900 A Dwarfs
View the last rows of information out of curiosity
tail(stars)
# A tibble: 6 × 5
star magnitude temp type stars
<chr> <dbl> <dbl> <chr> <chr>
1 *40EridaniA 6 4900 K Super Stars
2 *40EridaniB 11.1 10000 DA Super Stars
3 *40EridaniC 12.8 2940 M Giants
4 *70OphiuchiA 5.8 4950 K Super Stars
5 *70OphiuchiB 7.5 3870 K Super Stars
6 EVLacertae 11.7 2800 M Giants
Plot the Scatterplot where temp is on the x-axis, magnitude is on the y-axis, and the newly created category will be plotted by color to differentiate type of stars
p <-ggplot(stars, aes(x = temp, y = magnitude, color = stars)) +geom_point(size =3) +#the size added to the plotted data pointsgeom_hline(yintercept =0, linetype ="dashed", color ="purple") +#select a unique color to differentiate the y-intercept to show the negative values labs(title ="Star's Magnitude: Investigating Incremental Levels",#use space to separate long title x ="Temperatures", #name of x-axis reflected in the visualizationy ="Magnitude", #name of y-axis reflected in the visualizationcolor ="stars") +#assign color to the new categorytheme_minimal() +#use different style rather than the default themetheme(plot.title =element_text(hjust =0.5, size =14, face ="bold"), #dimensions for the titleaxis.title.x =element_text(size =12, face ="bold"), #dimensions for the x-axisaxis.title.y =element_text(size =12, face ="bold"), #dimensions for the y-axislegend.position ="top"#position the legend to the top )
Map size differentiation in magnitude
stars <- stars %>%mutate(temp_Size =case_when( magnitude >14~10, #assign the size when it meets the criteria for magnitude greater than 14 magnitude >10~3, #assign the size when it meets the criteria for magnitude greater than 10 TRUE~1#assign a size 1 when the value does not fit the other criteria ))
Calculate the Correlation Coefficient to analyze the relationship between variables
Calculate the Adjusted R-Squared to determine how much variation in the dependent variable is explained by variation in the independent variable
stars <-lm(temp ~ magnitude, data = stars) #conduct fit for a linear model for the variablessummary_stars <-summary(stars)adj_r_squared <- summary_stars$adj.r.squared #adjusted variation from the model summaryprint(paste("Adjusted R-squared: ", adj_r_squared))
[1] "Adjusted R-squared: 0.394557510155724"
Convert ggplot into Plotly
p_plotly <-ggplotly(p) %>%layout(legend =list(x =0.8, y =0.9,bgcolor ="rgba(255, 255, 255, 0.5)"#assigning colors to the data points ))
Run Plotly Output (Interactive Visualization)
p_plotly
Data Selection and Visualization
From the DS Labs dataset, I selected “stars.” The x-axis displays temperatures ranging from 2500 to 33600 Kelvin. The y-axis represents magnitude, which ranges from -8 to 17. Initially, there were too many designations for the “type” category, which could be confusing for the untrained scientist. Therefore, I simplified it by creating a third variable/column labeled ‘stars’ and categorized the incremental levels based on brightness into Dwarfs, Giants, and Super Stars adopted from the official astronomical naming convention (https://observatory.astro.utah.edu/Stars.html).
The plotly is my preferred interactive visualization. It allows the user to hover over each point, which highlights vital information. For example, when I hover over the point located on the lower most right hand corner (red). It provides the following information: temp: 33600, magnitude: -5.9, stars: Dwarfs.
Correlation Coefficient & Adjusted R-Squared
The correlation coefficient is -0.633. This means that the relationship between these two numerical variables are negative and moderate. This means that only 39.4% (adjusted r-squared) of the star’s magnitude is used to predict temperature and that there are other variations/contributing factors that may be at play.
Analyzing the Graph
The red color signifies warmer stars because the temperatures are higher. The levels of magnitude are located at and below the x-axis (negative numbers). As a result of the data points and what they represent, they are dim and are referred to as Dwarfs. The range expands horizontally.
The blue colors are generally cooler because the temperatures are generally less than 10,000. However, there are a few temperatures that are higher than 10,000, and in those instances, the magnitude are also extremely higher. These unique characteristics best summarize the Super Giants. The Super Giants also have a wide vertical range.
The green color represents magnitude over 10 and low temperatures within the 2000 to 3000 range. These stars are bright. They are also called Giants.The lower the temperature and the higher the magnitude, the stars will be cooler and brighter.