library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.2
## Warning: package 'ggplot2' was built under R version 4.5.2
## Warning: package 'tibble' was built under R version 4.5.2
## Warning: package 'tidyr' was built under R version 4.5.2
## Warning: package 'readr' was built under R version 4.5.2
## Warning: package 'purrr' was built under R version 4.5.2
## Warning: package 'dplyr' was built under R version 4.5.2
## Warning: package 'stringr' was built under R version 4.5.2
## Warning: package 'forcats' was built under R version 4.5.2
## Warning: package 'lubridate' was built under R version 4.5.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
library(pwrss)
## Warning: package 'pwrss' was built under R version 4.5.2
##
## Attaching package: 'pwrss'
##
## The following object is masked from 'package:stats':
##
## power.t.test
nasa_data <- read_delim("C:/Users/imaya/Downloads/cleaned_5250.csv",delim = ",")
## Rows: 5250 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): name, planet_type, mass_wrt, radius_wrt, detection_method
## dbl (8): distance, stellar_magnitude, discovery_year, mass_multiplier, radiu...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(nasa_data)
## # A tibble: 6 × 13
## name distance stellar_magnitude planet_type discovery_year mass_multiplier
## <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 11 Coma… 304 4.72 Gas Giant 2007 19.4
## 2 11 Ursa… 409 5.01 Gas Giant 2009 14.7
## 3 14 Andr… 246 5.23 Gas Giant 2008 4.8
## 4 14 Herc… 58 6.62 Gas Giant 2002 8.14
## 5 16 Cygn… 69 6.22 Gas Giant 1996 1.78
## 6 17 Scor… 408 5.23 Gas Giant 2020 4.32
## # ℹ 7 more variables: mass_wrt <chr>, radius_multiplier <dbl>,
## # radius_wrt <chr>, orbital_radius <dbl>, orbital_period <dbl>,
## # eccentricity <dbl>, detection_method <chr>
Notes: For this analysis we will examine the variables mass_multiplier and mass_wrt. The mass_multiplier represents the numerical value of the planet’s mass, while mass_wrt indicates the unit the mass is measured relative to (either Jupiter or Earth). The planet’s total mass is therefore interpreted as the multiplier times the reference unit.
To make comparisons easier across planets, all masses were standardized to Jupiter masses. This was done because some planets in the dataset are measured relative to Earth’s mass, while others are measured relative to Jupiter’s mass. According to standard astronomical conversions, 1 Jupiter mass is about 317.77 Earth masses. Therefore, when a planet’s mass is given relative to Earth, it can be converted to Jupiter masses by dividing by 317.77
Source : https://www.unitsconverters.com/en/Jupitermass-To-Massofearth/Unittounit-6003-173
lm_model <- lm(mass_multiplier ~ orbital_radius, data =nasa_data)
summary(lm_model)
##
## Call:
## lm(formula = mass_multiplier ~ orbital_radius, data = nasa_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.26 -4.65 -2.28 1.62 745.47
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.411902 0.188514 34.013 <2e-16 ***
## orbital_radius 0.002151 0.001355 1.587 0.113
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.24 on 4941 degrees of freedom
## (307 observations deleted due to missingness)
## Multiple R-squared: 0.0005096, Adjusted R-squared: 0.0003073
## F-statistic: 2.519 on 1 and 4941 DF, p-value: 0.1125
nasa_data$mass_jupiter <- ifelse(nasa_data$mass_wrt == "Jupiter",
nasa_data$mass_multiplier,
nasa_data$mass_multiplier / 317.8)
anova_model <- aov(mass_jupiter ~ detection_method, data = nasa_data)
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## detection_method 10 40824 4082 30.84 <2e-16 ***
## Residuals 5216 690362 132
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 23 observations deleted due to missingness
boxplot(mass_jupiter ~ detection_method,
data = nasa_data,
col="lightblue",
main="Planet Mass by Detection Method",
xlab="Detection Method",
ylab="Mass (Jupiter Masses)")
ecc_fd <- nasa_data |>
filter(planet_type %in% c("Neptune-like", "Super Earth")) |>
drop_na(eccentricity)
ecc_summary <- ecc_fd |>
group_by(planet_type) |>
summarise(
mean_orbit = mean(eccentricity, na.rm = TRUE),
sd_orbit = sd(eccentricity, na.rm = TRUE),
n = n()
)
print(ecc_summary)
## # A tibble: 2 × 4
## planet_type mean_orbit sd_orbit n
## <chr> <dbl> <dbl> <int>
## 1 Neptune-like 0.0330 0.0926 1825
## 2 Super Earth 0.0172 0.0620 1595
ecc_fd$planet_type <- factor(ecc_fd$planet_type, levels = c("Neptune-like", "Super Earth"))
t_test_results <- t.test(
eccentricity ~ planet_type,
data = ecc_fd,
alternative = "greater",
var.equal = FALSE
)
t_test_results
##
## Welch Two Sample t-test
##
## data: eccentricity by planet_type
## t = 5.9425, df = 3209.5, p-value = 1.554e-09
## alternative hypothesis: true difference in means between group Neptune-like and group Super Earth is greater than 0
## 95 percent confidence interval:
## 0.01146117 Inf
## sample estimates:
## mean in group Neptune-like mean in group Super Earth
## 0.03300422 0.01715473
A Welch two-sample t-test compared orbital eccentricity between Neptune-like and Super Earth planets. The p-value (1.554e-09) is much smaller than 0.05, providing strong evidence against the null hypothesis. Neptune-like planets have a higher mean eccentricity (0.0330) than Super Earth planets (0.0172).
nasa_data |>
filter(planet_type %in% c("Neptune-like", "Super Earth")) |>
drop_na(eccentricity) |>
ggplot() +
geom_boxplot(
mapping = aes(
x = factor(planet_type, levels = c("Super Earth", "Neptune-like")),
y = eccentricity
),
notch = TRUE,
fill = "skyblue",
outlier.alpha = 0.2
) +
labs(
title = "Orbital Eccentricity of Neptune-like vs Super Earth Planets",
x = "Planet Type",
y = "Eccentricity"
) +
theme_minimal()
It is difficult to distinguish the differences visually because most planets in this dataset have very low eccentricities, causing the boxes to sit near the bottom of the scale. However, the underlying data confirms a clear trend, Neptune-like planets have a higher mean eccentricity (0.0330) compared to Super Earth planets (0.0172).