library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.2
## Warning: package 'ggplot2' was built under R version 4.5.2
## Warning: package 'tibble' was built under R version 4.5.2
## Warning: package 'tidyr' was built under R version 4.5.2
## Warning: package 'readr' was built under R version 4.5.2
## Warning: package 'purrr' was built under R version 4.5.2
## Warning: package 'dplyr' was built under R version 4.5.2
## Warning: package 'stringr' was built under R version 4.5.2
## Warning: package 'forcats' was built under R version 4.5.2
## Warning: package 'lubridate' was built under R version 4.5.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
nasa_data <- read_delim("C:/Users/imaya/Downloads/cleaned_5250.csv",delim = ",")
## Rows: 5250 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): name, planet_type, mass_wrt, radius_wrt, detection_method
## dbl (8): distance, stellar_magnitude, discovery_year, mass_multiplier, radiu...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(nasa_data)
## # A tibble: 6 × 13
## name distance stellar_magnitude planet_type discovery_year mass_multiplier
## <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 11 Coma… 304 4.72 Gas Giant 2007 19.4
## 2 11 Ursa… 409 5.01 Gas Giant 2009 14.7
## 3 14 Andr… 246 5.23 Gas Giant 2008 4.8
## 4 14 Herc… 58 6.62 Gas Giant 2002 8.14
## 5 16 Cygn… 69 6.22 Gas Giant 1996 1.78
## 6 17 Scor… 408 5.23 Gas Giant 2020 4.32
## # ℹ 7 more variables: mass_wrt <chr>, radius_multiplier <dbl>,
## # radius_wrt <chr>, orbital_radius <dbl>, orbital_period <dbl>,
## # eccentricity <dbl>, detection_method <chr>
Do planets farther from their star have longer orbital periods?
orbit_b <- c(0, 1, 5, 50, 400)
orbit_l <- c("Very Close", "Close", "Far", "Very Far")
nasa_data <- nasa_data |>
mutate(orbit_cat = cut(orbital_radius,
breaks = orbit_b,
labels = orbit_l,
right = TRUE))
nasa_data |>
group_by(orbit_cat) |>
summarise(avg_period = round(mean(orbital_period, na.rm = TRUE),4),
count = n())
## # A tibble: 5 × 3
## orbit_cat avg_period count
## <fct> <dbl> <int>
## 1 Very Close 0.0898 4235
## 2 Close 3.96 546
## 3 Far 45.7 136
## 4 Very Far 4160. 29
## 5 <NA> 7849. 304
The data represent Kepler’s Third Law as it implies that the radius of a planet’s orbit around its star influences the planet’s orbital period (NASA,2024). In the data, planets with an average orbital period of 0.0898 are in the “Very Close” category, indicating that planets closer to their star complete an orbit more quickly. As the orbit size increases, the average orbital period also increases, going from 3.9582 (Close) to 45.6809 (Far) and up to 4160.2966 (Very Far). This shows a strong positive relationship between orbital radius and orbital period, meaning that planets farther from their star take significantly longer to orbit.
Reference
NASA. (2024, May 2). Orbits and Kepler’s Laws - NASA Science Kepler’s Laws. Science.nasa.gov; NASA. https://science.nasa.gov/solar-system/orbits-and-keplers-laws/
Do planets with more elliptical orbits tend to orbit farther from their star?
neg_rows <-nasa_data[nasa_data$eccentricity <0,]
neg_rows
## # A tibble: 3 × 14
## name distance stellar_magnitude planet_type discovery_year mass_multiplier
## <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 HD 1559… 90 7 Neptune-li… 2022 7.50
## 2 HD 2177… 181 7.78 Neptune-li… 2022 7.36
## 3 HD 9335… 183 9.12 Super Earth 2022 3.54
## # ℹ 8 more variables: mass_wrt <chr>, radius_multiplier <dbl>,
## # radius_wrt <chr>, orbital_radius <dbl>, orbital_period <dbl>,
## # eccentricity <dbl>, detection_method <chr>, orbit_cat <fct>
ecc_b <- c(0, 0.1, 0.3, 0.7, 1)
ecc_l <- c("Nearly Circular", "Slightly Elliptical", "Highly Elliptical","Extremely Elliptical")
nasa_data <- nasa_data |>
mutate(eccentricity_level = cut(eccentricity,
breaks = ecc_b,
labels = ecc_l,
right = TRUE,
include.lowest =TRUE
))
nasa_data |>
group_by(eccentricity_level) |>
summarise(avg_radius = round( mean(orbital_radius, na.rm = TRUE),4),
avg_period = round(mean(orbital_period, na.rm = TRUE), 4),
count = n())
## # A tibble: 5 × 4
## eccentricity_level avg_radius avg_period count
## <fct> <dbl> <dbl> <int>
## 1 Nearly Circular 7.85 587. 4256
## 2 Slightly Elliptical 1.68 4.65 628
## 3 Highly Elliptical 5.68 34.1 309
## 4 Extremely Elliptical 8.30 73.7 54
## 5 <NA> 0.0343 0.0063 3
The data shows three NA values because those eccentricity values are negative. Since eccentricity cannot be negative, those values are being represented as NA. It is impossible to have a negative eccentricity, so there may have been a mistake in the calculation or how the data was recorded. Looking at the data, there does not seem to be a clear trend between how elliptical the orbit is and the orbital radius. For example, planets that are “Nearly Circular” have an average radius of 7.8470, while “Slightly Elliptical” planets have a much smaller average radius of 1.6779. Although “Extremely Elliptical” planets have a higher average radius of 8.2984, the pattern is not consistent across the categories. This suggests that there is not a strong relationship between orbital eccentricity and how far a planet orbits from its star. The elliptical shape of an orbit may be influenced more by other properties of the planet or gravitational interactions rather than just the distance from the star.
ggplot(nasa_data, aes(x =orbit_cat, y=orbital_period)) +
geom_boxplot()+
scale_y_log10()+
labs(
title ="Orbital Period by Orbital Radius Level ",
x ="Orbital Radius Level ",
y ="Orbital Period"
)
The boxplot shows a clear increasing trend between orbital radius and orbital period. Planets in the “Very Close” category have much smaller orbital periods compared to the other categories. Each category level increases from the previous one, showing Kepler’s Third Law in action: as the orbital distance from the star increases, the orbital period also increases.
ggplot(nasa_data, aes(x = eccentricity_level, y=orbital_radius)) +
geom_boxplot()+
scale_y_log10()+
labs(
title ="Orbital Radius by Eccentricity Level ",
x =" Eccentricity level ",
y ="Orbital Radius"
)
## Warning: Removed 289 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
The boxplot seems to show a slight increasing trend between eccentricity level and orbital radius, but this pattern may not reflect a real relationship. Some of the values were originally NA because they were negative or impossible, and including them with include.lowest = TRUE may have shifted the averages and spread. Overall, the plot suggests that eccentricity does not strongly determine how far a planet orbits from its star, and the apparent trend could be influenced by a few unusual or corrected values.
nasa_data |> summarise(corrolation =cor(orbital_radius,orbital_period,use ="complete.obs",method="pearson"))
## # A tibble: 1 × 1
## corrolation
## <dbl>
## 1 0.953
nasa_data |> summarise(corrolation =cor(eccentricity,orbital_radius,use ="complete.obs",method="pearson"))
## # A tibble: 1 × 1
## corrolation
## <dbl>
## 1 -0.00896
The correlation between orbital radius and orbital period is very strong (0.95), which shows that planets farther from their star tend to take much longer to complete an orbit, consistent with Kepler’s Third Law.
The correlation between eccentricity and orbital radius is essentially zero (-0.009), indicating that how elliptical a planet’s orbit is does not have a meaningful relationship with how far it is from its star.
orbital_period_m <- mean(nasa_data$orbital_period, na.rm = TRUE)
orbital_period_sd <- sd(nasa_data$orbital_period, na.rm = TRUE)
n <- sum(!is.na(nasa_data$orbital_period))
error_margin <- qt(0.975, df = n-1) * orbital_period_sd / sqrt(n)
ci_low_p <- orbital_period_m - error_margin
ci_up_p <- orbital_period_m + error_margin
ci_low_p
## [1] 24.48456
ci_up_p
## [1] 933.8172
At a 95% confidence level, the lower bound for the mean orbital period is about 24, and the upper bound is about 934. This means that we are 95% confident that the true average orbital period for planets falls between 24 and 934. The wide range shows that there is a lot of variability that can affect a planet’s orbit.
orbital_rad_m <- mean(nasa_data$orbital_radius, na.rm = TRUE)
orbital_rad_sd <- sd(nasa_data$orbital_radius, na.rm = TRUE)
n_rad <- sum(!is.na(nasa_data$orbital_radius))
em_rad <- qt(0.975, df = n_rad-1) * orbital_rad_sd / sqrt(n_rad)
ci_low_rad <- orbital_rad_m - em_rad
ci_up_rad <- orbital_rad_m + em_rad
ci_low_rad
## [1] 3.103156
ci_up_rad
## [1] 10.82273
At a 95% confidence level, the lower bound for the mean orbital radius is about 3.1, and the upper bound is about 10.8. This means that we are 95% confident that the true average orbital radius for planets falls between 3.1 and 10.8. The narrower range compared to orbital period suggests that while planets can have very different orbital periods, their distances from their star are somewhat more consistent, though there is still variability.